Native Code

Chapter 11 Native Code

A Java virtual machine commonly needs access to various native functions in order to interact with the outside world. For instance, all the low-level graphics functions, file access functions, networking functions, or other similar routines that depend on the underlying operating system services typically need to be written in native code.

The way these native functions are made available to the Java virtual machine can vary from one virtual machine implementation to another. In order to minimize the work that is needed when porting the native functions, the Java Native Interface1 (JNI) standard was created. The Java Native Interface generally serves two purposes: 1) JNI serves as a common interface for virtual machine implementers so that the same native functions will work unmodified with different virtual machines; 2) JNI provides Java-level APIs that make it possible for a Java programmer to dynamically load libraries and access native functions in those libraries.

Unfortunately, because of its general nature, JNI is rather expensive and introduces a significant memory and performance overhead to native function calls. Also, the ability to dynamically load and call arbitrary native functions from Java programs could pose security problems in the absence of the full Java 2 security model.

KVM does not support the Java Native Interface (JNI). Rather, KVM supports an interface called K Native Interface (KNI), which implements a logical subset of JNI that is significantly more efficient in terms of performance and memory consumption. In addition, KVM also supports an older interface that allows native functions to be added to the KVM in a VM-specific fashion. Information on KNI is provided below in Section 11.1 "Using the K Native Interface (KNI)”. Information for writing native functions using the old-style native interface is provided in Section 11.2 "Implementing old-style native methods.”

11.1 Using the K Native Interface (KNI)

Starting from KVM 1.0.4, KVM has a new interface for writing native functions. The high-level goal of this new interface, K Native Interface (KNI), is to allow native functions to be added to the KVM (and other small-footprint virtual machines) in a manner that is both highly efficient and fully independent of the internal structures of the virtual machine. The K Native Interface (KNI) Specification, (Sun Microsystems, Inc., 2001) (KNI Specification) defines a logical subset of the Java Native Interface (JNI) that is well-suited for low-power, memory-constrained devices. KNI follows the function naming conventions and other aspects of the JNI as far as this is possible and reasonable within the strict memory limits of CLDC target devices and in the absence of the full Java 2 security model. Since KNI is intended to be significantly more lightweight than JNI, some aspects of the interface, such as the parameter passing conventions, have been completely redesigned and are significantly different from JNI.

The K Native Interface is described in more detail in the KNI Specification. Please refer to this specification to learn about the K Native Interface and its usage.

Things to remember when getting started with KNI: One of the key goals of the KNI is to isolate the native function programmer from the implementation details of the virtual machine. Instead of writing native functions using KVM-specific functions and data structures (as required by the old-style native interface described in Section 11.2 "Implementing old-style native methods”), KNI allows native functions to be written using a set of functions that operate identically and efficiently across a wide variety of virtual machines. To ensure portability of native code, the native function programmer shall not use any KVM-specific include files or KVM-specific functions or data types. Rather, the programmer must include the file “kni.h” and use functions and data types defined in that file.

11.2 Implementing old-style native methods

Note – It is highly recommended that you use KNI for writing native functions to the KVM. The use of the old-style API is strongly discouraged for all other purposes than for writing asynchronous native functions (Section 11.4 "Asynchronous native methods”).

WARNING: You should not write old-style native methods unless you have thoroughly read through the implementation and understand its structures. Most of the material in this porting guide is moderately straightforward. The material in this subsection is not!

Old-style native methods must be written extremely carefully. Inattention to detail will cause fatal errors in the virtual machine.

11.2.1 Include files

Your code containing old-style native functions should begin with the line

    #include <global.h>

which causes all include files that are part of KVM to be included. You might also need to #include additional files.

11.2.2 Accessing arguments from old-style native methods

When a native method is called, its arguments are on top of the Java stack. A static method’s arguments should be popped from the stack in the reverse order from which they were pushed. CODE EXAMPLE 3 shows an example of this coding style:

CODE EXAMPLE 3 Handling arguments of native static methods

Java code:

static native void 
drawRectangle(int x, int y, int width, int height);

Native implementation:

static void Java_com_sun_kjava_Graphics_drawRectangle() { 
    int height = popStack(); 
    int width =  popStack(); 
    int y =      popStack(); 
    int x =      popStack(); 
    windowSystemDrawRectangle(x, y, width, height); 
}

An instance method (non-static method) must pop the this argument off the stack after it has popped the rest of the arguments.

Note – Failing to pop the this argument in a native instance method will almost surely cause a fatal error in the virtual machine.

Table 12 shows the macros that should be used to pop arguments off the stack:

TABLE 12 – Macros for popping arguments from the stack
C type	Macro for popping
`char`, `byte`, `int`, `long`	`popStack()`
`float`	`popStackAsType(float)`
`long64`, `ulong64`	`popLong()`
`double`	`popDouble()`
`pointerType`	`popStackAsType(pointerType)`

11.2.3 Returning a result from an old-style native function

If a native method returns a result, it must push that result onto the stack. The native code should use the appropriate macro shown in Table 13 to push the result back onto the stack:

TABLE 13 – Macros for pushing arguments onto the stack
C type	Macro for pushing
`char`, `byte`, `int`, `long`	`pushStack()`
`float`	`pushStackAsType(float)`
`long64, ulong64`	`pushLong()`
`double`	`pushDouble()`
`pointerType`	`pushStackAsType(pointerType)`

11.2.4 Shortcuts

Some native code uses the macro topStack instead of popping the last argument off the stack. It then sets topStack to the value it wants to return.

This practice is not encouraged. It should only be used for “one-liners” that access the argument and return the value in a single statement. pushStack and popStack cannot be used in this case, since C would not guarantee their order of evaluation.

In general, it is safer to pop the value, perform the calculation, and push the value back onto the stack as three separate steps.

11.2.5 Callbacks

Native code cannot call back into Java code. KVM provides a mechanism by which native code can alter the interpreter state to begin executing a new piece of code. Upon finishing executing that code, the mechanism can indicate a new C function which should be called.

11.2.6 Exception handling in old-style native code

If the native code needs to throw an error or exception, it should call the function

void raiseException(const char* exceptionClassName)

where the exceptionClassName argument is the exception class or error class.

11.2.7 Useful functions in old-style native code

Other useful functions that a native method might need to call are the following:

void fatalError(const char* errorMessage);

The code calls this method to indicate that a serious error has occurred.
The errorMessage argument is a brief explanation of the problem. This method does not return.

CLASS getClass(const char *name);

This method returns the class whose name is the indicated argument. You might want to coerce the return result to be an INSTANCE_CLASS or an ARRAY_CLASS.

STRING_INSTANCE instantiateString(const char* string,

int length);

This method converts the given C string into a Java string.

char *getStringContents(STRING_INSTANCE string);

The instance argument must be a Java string. It is converted into a null-terminated C string, and returned as the result.

The string is placed into a global buffer. If your code must hold onto this string for any length of time, you must copy the buffer into stack-allocated storage, or allocate space from the Java heap.

INSTANCE instantiate(INSTANCE_CLASS class);

Creates a new Java instance of the specified class.

ARRAY instantiateArray(ARRAY_CLASS arrayClass, long length);

Creates a Java array of the specified type and length.

SHORTARRAY createCharArray(const char* utf8stringArg,

int utf8length,

int* unicodelengthP,

bool_t is_permanent);

Creates a Java character array from the C string passed as an argument.

char* mallocBytes(long sizeInBytes);

Allocates a memory block in the garbage-collected heap that is big enough to hold sizeInBytes number of bytes. You can create a temporary root (Section 11.2.8 "Garbage collection issues”) to prevent the memory block from being garbage-collected.

11.2.8 Garbage collection issues

The C stack is not scanned when the KVM performs a garbage collection. If your native code allocates new Java objects, you must take special precautions to prevent your new Java objects from being garbage collected inadvertently.

Since the release 1.0.2, KVM includes a compacting garbage collector. Any time that your native code performs an allocation, objects in the Java heap can move. This includes any arguments passed to your native function and any previous heap allocations performed by your native code.

Note – We strongly recommend that you do not write native methods that perform allocation from the Java heap. You greatly increase the chances that your code will have hard-to-find and hard-to-reproduce bugs.

Note – If, for example, you need to create a structure, it is better to create that structure in Java code, and pass it as an argument to the native code.

If your code must perform allocation, it is important that you

Pop all arguments off the stack before you perform any allocation.

Push the return value (if any) onto the stack after you have performed any allocation.

The garbage collector can get erroneous results if an allocation occurs while an argument or return value is on the Java stack. The rest of this chapter describes how your code can interact correctly with the garbage collector.

11.2.8.1 Heap Space and Permanent Space

In order to simplify the garbage collector, the KVM’s memory is divided into two spaces: “permanent space” and “heap space”.

All objects created in permanent space are, well, permanent. These objects are

never freed by the garbage collector,

never scanned by the garbage collector to see if they contain pointers to other objects,

never relocated.

Among the objects that are allocated in permanent space are

class structures,

Java byte codes,

method tables,

field tables,

interned instances of java.lang.String (but not all strings).

These objects are never moved and never freed after they are created.

Structures that have a possibly limited lifetime are allocated in heap space. Among these are

all Java instances (except for interned Strings),

threads,

stack chunks.

These structures are liable to move any time an allocation occurs. Your code must be following the rules specified in the following subsections to ensure that your code lives happily with the garbage collector.

11.2.8.2 Asserting no allocation

The KVM provides the two macros ASSERTING_NO_ALLOCATION and END_ASSERTING_NO_ALLOCATION, which are used as shown in CODE EXAMPLE 4:

CODE EXAMPLE 4 Forbidding garbage collection

ASSERTING_NO_ALLOCATION 
    non allocating code 
END_ASSERTING_NO_ALLOCATION;

These macros are provided for use only in DEBUG mode to guarantee that no allocation is performed by the code between the ASSERTING... and the END_ASSERTING... macro.

If your code is compiled with INCLUDEDEBUGCODE set to a non-zero value, then any allocation inside the specified code causes a fatal error.

If you use the macros, make sure that the non-allocating code inside the macros does not perform a return. The macro END_ASSERTING_NO_ALLOCATION contains cleanup code that must be executed.

You are encouraged to use these macros to indicate safe regions of code in which heap-allocated objects will not move.

11.2.8.3 Handles in old-style native functions

To deal with the fact that heap-allocated objects in the KVM can move, the garbage collector makes use of temporary “handles.” A handle is an indirect pointer to an object. Rather than being the address of the object itself, a handle is the address of a memory location that contains the address of the object.

The memory location that contains the address of the object must not itself be in the Java heap. In general, it is the address of a variable (for global roots) or the address of a location on the C stack (for temporary roots).

If the object is possibly in the Java heap, then the memory location that contains the address of the object must be registered with the garbage collector. It can either be a temporary root (see §11.2.8.4) or a permanent root (see §11.2.8.5).

If the object is not in the Java heap, then the handle does not need to be registered with the garbage collector.

All type names in the KVM that end with _HANDLE indicate handles. If an argument has a handle as one of its arguments, the argument must be an indirect pointer, and must be registered with the garbage collector if the object could be in the Java heap.

CODE EXAMPLE 5 shows an example:

CODE EXAMPLE 5 Creating a handle

CLASS getClassX(CHAR_HANDLE name, int start, int length); 
 
/* Case 1, We are calling it with an argument that is known */ 
/*         not to be in the heap. */ 
const char *x = “java/lang/Object”; 
result = getClassX(&x, 0, strlen(x)); 
 
/* Case 2. We are calling it with a heap argument */ 
START_TEMPORARY_ROOTS 
    DECLARE_TEMPORARY_ROOT(char *, x, mallocBytes(100)); 
    sprintf(x, “java/lang/%s”, arg); 
    result = getClassX(&x, 0, strlen(x)); 
END_TEMPORARY_ROOTS

11.2.8.4 Temporary Roots

The most common method is to use START_TEMPORARY_ROOTS and END_TEMPORARY_ROOTS to delimit a region of code. Within this region of code, the macro

    DECLARE_TEMPORARY_ROOT(type, variable,value)

creates a local variable of the specified type with the specified initial value. The value must either be a pointer to an object in the heap, or it must be a value that is clearly not in the heap (such as NULL, a pointer to permanent space, or the like).2 The value &variable is registered with the garbage collector as a temporary root.

You are allowed to change the value of variable, provided that any new value is always either a pointer to an object in the heap, or a value that is clearly not in the heap.

The garbage collector ensures whenever a garbage collection occurs, the value of the variable is updated if the value has moved. In addition, &variable is a handle, and can be passed as an argument to any function that expects a handle.

Your code must not return. The END_TEMPORARY_ROOTS contains cleanup code that must be executed.

CODE EXAMPLE 6 below shows some sample code for a native method that takes a String and two integers as arguments, and which must allocate a temporary buffer.

CODE EXAMPLE 6 Temporary roots

START_TEMPORARY_ROOTS 
    int y = popStack(); 
    int x = popStack(); 
    DECLARE_TEMPORARY_ROOT(STRING_INSTANCE, string,  
                              popStackAsType(STRING_INSTANCE)); 
    DECLARE_TEMPORARY_ROOT(char*, buffer, mallocBytes(100)); 
    /* code that might perform allocation */ 
END_TEMPORARY_ROOTS

If the code clearly cannot perform any allocation, then you could instead have written

    char* buffer = mallocBytes(100);

Less commonly used is the macro

    DECLARE_TEMPORARY_ROOT_FROM_BASE(type, var, value, base)

In this case base must be a pointer to an object in the heap, and value must be a pointer into the middle of the object. The variable var is assigned the value value. The garbage collector will treat base as a root. If base is moved by the garbage collector, the value of var will be adjusted appropriately.

11.2.8.5 Global roots

If your code initializes a C variable to point to an object in the Java heap, you can use the code shown in CODE EXAMPLE 7. There is currently no function for removing a variable from the set of global roots.

CODE EXAMPLE 7 Creating a global root

variable = <value> 
makeGlobalRoot(&(cell **)variable);

This code ensures that the garbage collector knows that the specified variable contains a value that must be protected from garbage collection. If the garbage collector moves the object, the variable is updated to point to the new value.

11.2.8.6 Debugging your native code

A special garbage collector is provided to help you debug your native code and to ensure that it does not have any garbage collection problems. You access this garbage collector by replacing the file collector.c with collectorDebug.c. In addition, you should set the compiler flags INCLUDEDEBUGCODE and EXCESSIVE_GARBAGE_COLLECTION to 1.

This replaces the compact-in-place garbage collector with a 10-space Cheney style3 garbage collection algorithm. A garbage-collection will occur on every allocation, and also on some operations that might have allocated but didn’t. Every object moves on every garbage collection. In addition, this code makes use of memory-protection so that any attempts to read or write a bad pointer will generate a memory fault.

This code uses the following implementation-dependent functions:

    void* allocateVirtualMemory_md(long size); 
    void  freeVirtualMemory_md(void *address, long size); 
    void  protectVirtualMemory_md(void *address, long size,  
                                  int protection);

Implementations of these three functions for Windows and for Unix are provided. You must implement these functions on your target platform.

11.2.8.7 Two-space Cheney garbage collector

The file collectorDebug.c (see §11.2.8.6) also includes an implementation of a two-space non-debugging Cheney garbage collector. You get this implementation by setting the compiler flag CHENEY_TWO_SPACE to a non-zero value.

The Cheney collector is smaller and faster than the standard garbage collector. However, it uses twice as much heap space. If your implementations has a lot of available memory, but needs a faster garbage collector, you might consider using this garbage collector.

This collector is not supported by Sun, and is provided as is.

11.2.9 Initialization and reinitialization of global variables

Generally, the C language guarantees that all global and static variables are initialized to 0 (zero).

The current implementation is designed to work within an embedded environment. For example, on the PalmOS, the user can start the virtual machine, exit a program, and then restart the virtual machine with a different set of arguments. There is no re-initialization of global or static variables between the two runs.

In general, your code cannot assume the initial value of any variable. You have several options for determining when it is necessary to perform one-time only initialization.

You can use the function InitializeNativeCode() to either initialize your variables, or to set a flag indicating that initialization needs to be performed.

If a private native method is called as part of static initialization of a class, the method’s native implementation will be called the first time the class is used. The native implementation can perform any initialization necessary for the class.

If a variable is part of the global root set (see makeGlobalRoot() above), its value is guaranteed to be 0 the next time that the virtual machine is run.

11.3 Native code lookup tables

Regardless of whether you use the KNI (Section 11.1 "Using the K Native Interface (KNI)”) or the old-style native method implementation technology (Section 11.2 "Implementing old-style native methods”), as part of the build process you must create the lookup tables that map methods to the corresponding native implementation.

The JavaCodeCompact (JCC) generates these tables automatically. You should use this utility to generate the lookup tables whether or not you are using the other features of JavaCodeCompact.

JavaCodeCompact is more fully described in Chapter 14. The specific details for creating the file containing the lookup tables can be found in Section 14.5 "Executing JavaCodeCompact.”

The name of the C function that implements a native method must be the same name that JNI4 would assign to the native method.

11.4 Asynchronous native methods

Note – The KNI implementation does not allow asynchronous native methods to be written using the KNI API. In order to write asynchronous native methods for the KVM, you must use the old-style native function interface as described in this section.

From the operating system viewpoint, KVM is just one process (C program) with only one native thread of execution. The multithreading capabilities of KVM have been implemented entirely in software without utilizing the possible multitasking capabilities of the underlying operating system. This approach not only makes the virtual machine highly portable and independent of the operating system, but also greatly simplifies the virtual machine design and improves the readability of the codebase, as the virtual machine designer does not have to worry about mutual exclusion and other problems typically associated with multithreaded software.

However, an unfortunate side effect of the approach described above is that by default, all native methods in KVM are “blocking.” This means that when a native function is called from the virtual machine, all the threads in the VM stop executing until the native method completes execution.

As a general guideline, all the native functions called from KVM should be written so that they complete their execution as soon as possible. However, in many environments this is not desirable or fully possible. For this reason, KVM includes an implementation of “asynchronous native methods” described below.

11.4.1 Design of asynchronous methods

The standard implementation of KVM runs as a single “task” from the operating system’s point of view. If a native method performs an operation that can block, the entire KVM blocks.

Asynchronous native methods are intended to solve this problem. When such a native method is called, the operation is performed “off-line” in an implementation-dependent manner. Other Java threads can continue running normally. When the native call finishes, the Java thread that originally called the native method continues.

To use asynchronous native methods, you must include

    #define ASYNCHRONOUS_NATIVE_FUNCTIONS 1

in your machine-dependent include file.

Asynchronous native methods cannot be defined in the same file as normal native methods. In addition to their normal includes, they must also add the include file async.h.

Asynchronous methods should always have the following form:

ASYNC_FUNCTION_START(functionname) 
    code 
ASYNC_FUNCTION_END

Your code must never use pushStack(), popStack(), topStack, or any macro or function that references the stack pointer, the frame pointer, or the current thread. Instead, you must use the alternative macros shown in Table 14.

TABLE 14 – Macros used in asynchronous methods
Native function macro	Asynchronous native function macro
`popStack`	`ASYNC_popStack`
`pushStack`	`ASYNC_pushStack`
`popLong`	`ASYNC_popLong`
`pushLong`	`ASYNC_pushLong`
`popStackAsType`	`ASYNC_popStackAsType`
`pushStackAsType`	`ASYNC_pushStackAsType`
`raiseException`	`ASYNC_raiseException`
`topStack`	do not use this macro

In addition, your code must not perform a “return.” It must complete through the end, since ASYNC_FUNCTION_END may generate some necessary cleanup code.

All the macros in Table 14 have been designed so that if the symbol ASYNCHRONOUS_NATIVE_FUNCTIONS is 0, the asynchronous method compiles into a normal native method.

It is also important to note that unlike regular native methods, asynchronous native methods cannot allocate any memory from the Java heap. Because of this limitation, extra caution is often necessary when writing asynchronous native methods, since many internal routines in KVM may indirectly allocate memory from the Java heap.

Note – IMPORTANT: We repeat that asynchronous native methods must not allocate memory from the Java heap. Make sure that you read the paragraph above.

If you use asynchronous native methods, you must define the following machine-specific functions.

void Yield_md()

Pause this operating system task momentarily and allow other tasks to run.

void CallAsyncNativeFunction_md(ASYNCIOCB *iocb, void(*afp)(ASYNCIOCB *))

Call an asynchronous native function. This function is called by the ASYNC_FUNCTION_START macro to start a new asynchronous function. The function takes as a parameter a data structure that is used by the garbage collector to keep up to date object pointers used by the native code, and a function to call. This function will typically start a new native thread and have that call the supplied function with the ASYNCIOCB as its parameter.

enterSystemCriticalSection()
exitSystemCriticalSection()

Enter or exit a critical section. The operating system must guarantee that at most one operating system task is allowed to be inside the critical section at a time.

11.4.2 Implementation of asynchronous methods

We envision two possible implementations of asynchronous methods.

In the current reference implementation, the function CallAsyncNativeFunction_md spawns off a separate operating system task which performs the indicated function. For example, in a Posix implementation one could use pthread_create.

CODE EXAMPLE 8 below shows one possible implementation of a method
int readBytes(byte[] dst, int offset, int length)using this style of asynchronous native methods. This particular example assumes that the garbage collector does not move objects.

CODE EXAMPLE 8 Asynchronous implementation of ReadBytes

ASYNC_FUNCTION_START(ReadBytes) 
    long   length = ASYNC_popStack(); 
    long   offset = ASYNC_popStack(); 
    BYTEARRAY dst = ASYNC_popStackAsType(BYTEARRAY); 
    INSTANCE instance = ASYNC_popStackAsType(INSTANCE);/* this*/ 
    long fd = getFD(instance); 
    ASYNC_enableGarbageCollection(); 
    length = read(fd, dst->bdata + offset, length); 
    ASYNC_disableGarbageCollection(); 
    ASYNC_pushStack((length == 0) ? -1 : length); 
ASYNC_FUNCTION_END

In an alternative implementation (CODE EXAMPLE 9), CallAsyncNativeFunction_md simply calls the function f directly. It assumes that the function f starts an operation, but does not wait for its completion. The operating system is required to provide some sort of interrupt or callback to indicate when the operation is complete.

The second implementation is far more operating system-dependent. It might be impossible to write native methods that can work both synchronously and asynchronously, depending on the value of a flag.

CODE EXAMPLE 9 Alternative asynchronous implementation of ReadBytes

static void ReadBytes(THREAD thisThread) 
{ 
    long   length = ASYNC_popStack(); 
    long   offset = ASYNC_popStack(); 
    BYTEARRAY dst = ASYNC_popStackAsType(BYTEARRAY); 
    INSTANCE instance = ASYNC_popStackAsType(INSTANCE); 
    long fd = getFD(instance); 
    THREAD thisThread = CurrentThread; 
    /* Call OS to perform I/O. Perform callback when done. */ 
    AsyncRead(fd, p + offset, length, ReadBytesDone,thisThread); 
} 
 
/* Callback function when I/O is finished */ 
static void ReadBytesDone(void *parm, int length) 
{ 
    THREAD thisThread = (THREAD)parm; 
    ASYNC_pushStack((length == 0) ? -1 : length); 
    ASYNC_RESUME_THREAD(); 
}

Refer to Section 12.1.4 "Asynchronous notification,” for further information on writing asynchronous code.

1The Java Native Interface: Programmer’s Guide and Specification (Java Series) by Sheng Liang (Addison Wesley, 1999).

2The main purpose of this limitation is that the variable should not have a random integer as its value, and that the variable must be initialized.

3C.J. Cheney. A non-recursive list compacting algorithm. Communications of the ACM, 13(11):677-8, November 1970.

4See The Java Native Interface: Programmer’s Guide and Specification (Java Series) by Sheng Liang (Addison Wesley, 1999), for complete information on the JNI naming scheme. This information is available online at http://java.sun.com/docs/books/jni/index.html.

KVM Porting Guide
, CLDC 1.1