Introduction:
This project describes how techniques and tools used in Solaris for
library interface definition and binary compatibility may also prove
beneficial to Free and Open Source software development projects. Indeed,
a number of the practices we describe here have already been adopted
by the Linux GLIBC C library project [1,2]. The underlying theme of
this project is working to ensure release-to-release binary stability:
end-user's systems and applications keep on working even when other
components of the system are upgraded.
The process we describe involves library developers (for example GLIBC,
X11, GNOME, KDE, ...) defining the public interfaces of their libraries
and continuing to provide those interfaces in an upward compatible
manner. Any exposed internal interfaces that are not intended for
application developer consumption are clearly marked as private.
These private interfaces are part of the internal implementation (used,
say, for communication within the package) and so do not need to evolve
compatibly: by keeping developers off of these private interfaces the
library system is free to evolve and modify the implementation aspect
without breaking end-users. Also included in the process we describe
is the practice of scoping local to each library as many symbols as
possible to further reduce exposure to application breakage.
In general, the method we describe below is a useful scheme in terms
of providing compatibility to a large established end-user population.
This is because the costs of binary breakage (downtime, fixing, rebuilding
and re-testing applications) is quite noticeable in that situation.
We suspect that system distribution providers may find the techniques
we describe to be most worthwhile, because they can be used to improve
compatibility and stability for end users using their distribution.
In the long term, however, basically everyone benefits from improved
compatibility.
In a certain sense this project is complimentary to standards projects,
the most important being the Linux Standard Base [3]. One way of looking
at the difference is the Linux standard is working on compatibility
across different Linux distributions, whereas the mechanisms discussed
here are more focussed on the compatibility of a given distribution going
forward in time: to avoid end-user application breakage as he upgrades
components of his system (in particular, the entire distribution).
Both types of compatibility work are important. To a certain extent
they overlap each other in what they accomplish, yet they also help each
other out by focusing on different areas.
It should be noted that the documentation and tools provided by this
project do not (and cannot) by themselves provide a complete solution
to the binary instability problem. Foremost, the project plan requires
the participation and commitment of library providers to be successful.
The more libraries (API/ABI's) that follow this plan, the more binary
stability is enhanced and pays-off as time goes on. In addition, this
project concentrates on defining and maintaining only a certain part of
the interfaces an application depends upon (namely, the library binary
interface between it and the "system" shared libraries). This scheme
of course cannot stop all compatibility problems (e.g. changes in file
formats or file locations), but historically since a good fraction of
incompatibility occurs at this interface it is a good place to begin
working.
What is the ABI:
The Application Binary Interface (ABI) is the set of supported run-time
interfaces available for an application to use on the OS. The ABI
is very similar to the API (Application Programming Interface), but
differs in that it is the result of the source compilation process.
C source code written to the OS API is transformed by the C compiler
into a processor architecture specific binary for one of the ABI's (e.g.
32-bit or 64-bit address spaces) supported by the system.
The compilation process introduces several differences between
the ABI and API which are important for binary compatibility:
- Compiler directives (e.g.,
#define ) can replace source-level
constructs with different ones. The resulting binary may lack
a symbol present in the source, or include one not present in
the source.
- The compiler may generate processor-specific symbols (e.g.,
arithmetic instructions) which invisibly augment or replace
source constructs.
- The compiler's layout of binaries may be specific to that
compiler and the versions of the source language which it accepts.
Thus identical code compiled with different compilers may produce
incompatible binaries; this has been the case with C++.
The ABI is essentially where binaries of differing origins meet and have
to work together as a single process to accomplish the task at hand
for the application. There are many opportunities for failure, e.g.:
- Missing libraries or shared objects.
- Missing interfaces.
- Incompatible changes in library interfaces.
- Libraries needed by an application that, in turn, depend on
a different (often incompatible) version of a third library that
is also needed by the application.
- Incompatible changes in the output and behavior of system
commands, utilities, and files.
and so on. All of these need to be guarded against or otherwise
applications will fail for the end-user after parts of their system
or software are upgraded or replaced. There will be benefits if
system integrators and ABI providers (e.g. library developers) can
work successfully at further reducing exposure to application breakage.
It may appear to be asking too much of the system integrators and library
developers to do even more work with respect to compatibility (especially
since much of this work is done on a volunteer and/or gratis basis),
however the pay-off can be huge: the tens of millions of end-users vastly
outnumber the number of developers.
The ABI is important because it determines whether or not a binary
built on one release of the OS is able to run on subsequent releases.
This release-to-release binary compatibility is of increasing importance
to users because it means that their investment in applications can
be preserved across upgrades. Put another way, fear of application
breakage due to binary incompatibility is the single biggest reason for
user reluctance to adopt a new release of system software and technology;
binary compatibility allays that fear.
Defining the ABI:
Today a standard installation of a Unix operating system will have roughly
20,000 public symbol interfaces exported by the libraries it provides.
The number of private interfaces in the libraries (used for intra-package
communication e.g. library->library or utility-command->library) is of
the same order of magnitude. Library interface management is a large
problem and it will continue to get even larger: API's are growing
rapidly, primarily fueled by the contributions from many Free and Open
Source development projects.
It is true that in the life-cycle of an API there is a rapid growth
phase where there are a great deal of changes, and many of these
changes introduce incompatibilities. However, as time goes on certain
(and eventually nearly all) parts stabilize. Due to the utility of the
particular API, (and the utility resulting from applications that use
it), dependencies upon that API grow, and hence it becomes increasingly
important for it to be provided in a stable and well defined manner.
The task of maintaining an ABI stably is non-trivial. We describe here
some techniques applied in Solaris over the past five years that aid in
accomplishing this task. We describe a useful framework for defining and
maintaining the ABI, but, of course, a good deal of effort is required
on the part of library developers to adhere to this framework and focus
on and maintain compatibility.
For a given release of the system (that contains a number of "independent"
library packages, e.g. GLIBC, X11, GNOME, ...) the simplest way to define
the ABI is at the symbol interface level: e.g. shared library libxyz.so.1
provides the list of public interfaces:
{sym1, sym2, ...}
and exports (presumably out of necessity to communicate with its fellow
libraries and utilities in the same package, but not for consumption by
external applications) the list of private symbols:
{private_sym1, private_sym2, ...}.
Everything else is scoped local to the library and cannot be accessed,
even by other libraries or utilities in the same library package.
Ideally the public symbol information is not only described by manual
pages and documentation, but also resides inside the shared library binary
"libxyz.so.1 " itself so as to avoid possible discrepancies.
In Solaris and in the GLIBC package of Linux a further step is taken
that adds a rather useful structure to the set of public symbols. It is
described as follows. When a library first appears, its public symbols
are put into a single named set, for example:
PUBLIC_1: {sym, sym, ...}
and the remaining private symbols are all placed in, for example, the named
set:
PRIVATE: {sym, ...}.
(These names are made up for the sake of example and are not currently
used.) Now, when the next release of the library appears, it will
likely have some new functionality in the form of new interfaces [4].
Then we add a new public set that reflects this new functionality:
PUBLIC_2: {sym', sym', ...}
as well as the original PUBLIC_1 set and the PRIVATE set (the latter
may have changed in an arbitrary way, but that doesn't matter because
only co-shipped libraries and/or utilities in the same library package
are supposed to use the private interfaces and for a given release they
all work together).
Similarly, as more releases of the library come out, additional public
sets are added: PUBLIC_3, PUBLIC_4, ... etc. All of this information, the
set names and set members, is recorded in the library itself in special
ELF sections. We refer to this procedure as "Library Versioning" [5].
Library versioning is useful in that it can be used to avoid renaming the
shared object with new minor release version numbers as it evolves. That
is, instead of the sequence of new files: libxyz.so.1.1 , libxyz.so.1.2 ,
libxyz.so.1.3 , ... as the library evolves, the shared object name can
remain fixed at libxyz.so.1 (as long as it evolves upward compatibly).
Furthermore, the traditional minor release incrementing "1.1" -> "1.2"
really only indicates "something was added". With library versioning
recorded in the shared library itself, the information is exactly
what was added: it is the interfaces listed in the PUBLIC_2 set.
When an application is built, the Solaris and GNU link editors (ld(1) )
record in an ELF section of the resulting binary executable the highest
level "watermark" required by the application for each versioned library
(e.g. application binary "foo" needs "libxyz.so.1 " at level PUBLIC_2 and
"libabc.so.1 " at level PUBLIC_1, etc). At runtime, the dynamic linker
reads this information and while it loads the needed shared libraries it
can also quickly check whether the required version levels are supplied
by the loaded shared libraries. If not, it will exit with an error since
it is known at this point at least one symbol needed by the application
(or shared libraries) is missing [6].
The recording of the library version symbol information and needed version
levels in shared libraries and executables and also the local scoping
symbols practice (see the following section) are useful aspects of dynamic
linking. However, a more subtle benefit is having the framework in place
in the shared library's source tree for maintaining the monotonically
increasing and upward compatible chain of public symbols (the PUBLIC_1,
PUBLIC_2, sets in the above example) as well as scoping local any symbols
that do not need to be visible.
Mechanics of Versioning Libraries:
We describe briefly here the basic technique used to add the versioning
information to shared libraries. The complete details can be found
in references [5] and [7].
Both the Solaris and GNU link editors ld(1) support the notion of
library symbol versioning by use of an input file (usually called a
"versioning mapfile" or "version script"). When this file is passed
to the link editor (via command line arguments: ld ... -M <file> and
ld ... --version-script=<file> on Solaris and Linux, respectively) and
the shared library is assembled, the versioning mapfile will be parsed
and its information recorded in special ELF sections of the library.
Here is a made-up example of a versioning mapfile for a fictitious
library libfoo.so.1 :
PUBLIC_2 {
global:
symbolD;
symbolE;
} PUBLIC_1;
PUBLIC_1 {
global:
symbolA;
symbolB;
symbolC;
};
PRIVATE {
global:
__fooimpl;
local:
*;
};
When libfoo.so.1 was first released, it exported just the three public
(i.e. intended for developer's use) symbols "symbolA", "symbolB",
"symbolC" and it also exported the library package private symbol
"__fooimpl". For intuition on the role of "__fooimpl", imagine that
a companion library "libbar.so.1 " is co-shipped with "libfoo.so.1 "
and libbar.so.1 occasionally calls __fooimpl() in libfoo.so.1 as, say,
a private communication channel for use in the (current) library package
implementation.
In a later release of libfoo.so.1 additional functionality was added in
the form of the two new functions "symbolD" and "symbolE". Note the
"watermark" chaining that occurs in the syntax: PUBLIC_2 includes
all of the symbols at level PUBLIC_1. This chaining continues with
subsequent releases of the library e.g.:
PUBLIC_4 {
global:
symbolH;
symbolI;
} PUBLIC_3;
PUBLIC_3 {
global:
symbolF;
symbolG;
} PUBLIC_2;
...
etc. This mechanism emphasizes the strictly monotonic increase with time
of the public interface offering: removing a public symbol (e.g. symbolD)
in some later release is very bad since it will break all applications
that require symbolD.
When an application binary is built the highest level of the public
chain is recorded by the link editor. For example, if the application
only used "symbolB" the version dependency "PUBLIC_1" for libfoo.so.1
would be recorded in the binary. If, however, the application used
"symbolA", "symbolE", and "symbolF" then the level PUBLIC_3 would be
recorded instead. If that application was then distributed to an older
system with a library libfoo.so.1 that was only at level PUBLIC_2, then
the runtime linker would immediately know something was wrong and would
indicate the error and exit [8].
An additional benefit can be achieved with the library versioning
technology. Once the symbol sets (e.g. PUBLIC_1, PUBLIC_2, ...,
PRIVATE the above example) are defined it is also possible to provide a
directive when building a shared library to scope local to the library
all remaining symbols. This is done by the "local: *" directive in
the mapfile shown above. Traditionally, scoping for libraries is done
by using the "static" C keyword in defining internal-implementation
functions. However, this only provides a per-file level of scoping,
and will not work if the shared library is composed of a number of
object files (.o files) and the internal implementation functions are
called between the object files. The library versioning "local: *"
directive allows the final scoping to occur when the whole shared object
is assembled to remove any symbols that do not need to be exported.
This local scoping technique is a convenient and powerful way to
make sure no internal implementation symbols "slip out" accidentally.
If application developers, either accidentally or intentionally, started
using these library-internal implementation symbols, their applications
would be at a high risk of breaking in the future. One's initial reaction
might be "too bad; that developer should not have used those symbols".
However, if the application, customer, or installed base is "important
enough" (by some measure) there could be pressure placed on the library
developer to actually support these otherwise internal interfaces.
It is best to use the "local: *" scoping technique to simply avoid this
problem in the first place.
Tools to check for correct use of the ABI:
Once the library symbol versioning is in place in the libraries of a
library package, including the "PRIVATE" labelling of the library-package
internal symbols discussed above, it is a straight-forward matter to
construct tools that can test for applications' conformance to public
portion of the ABI.
Ideally, one would desire the entire ABI (i.e. all library packages on
the system) to be versioned and classified as described above. However,
even when only some of the library packages are versioned there is
still much utility in checking against an application's usage against
the portion of the ABI provided by those packages.
In Solaris, nearly all of the 30,000 symbols in the 160 shared libraries
Solaris provides have been versioned in the manner outlined above.
The version set names convention used in Solaris is "SUNW_n.m" for the
public chain (where "n" and "m" are integers; "m" plays the role of the
minor-release number in the traditional scheme), and "SUNWprivate" for
the private symbol set. The "n" may be thought of as the major-release
number, but it does not really matter because a major-release, n -> n+1
indicates incompatible change for which there would have to be a separate
shared library.
The tool we provide in this project is called "abicheck ". It is a
simple perl script that runs system utility commands [9] to extract the
dynamic bindings of a built executable [10]. For each symbol binding
it deduces the symbol version set name the symbol resides in. If that
symbol's version set matches "private" abicheck prints out a warning.
The user may specify a different matching pattern on the command line.
abicheck is the core functionality of a tool used by Sun in Solaris
application certification branding programs called "appcert " [11].
abicheck runs on both Linux and Solaris and is basically appcert with
the certification "baggage" removed. We felt it was best to start with
a simpler, straight-forward tool that is easier to understand and add
enhancements to, rather than port all of appcert to Linux yielding a
situation where a fair amount of functionality that doesn't really apply
to the task at hand.
One feature that has been carried over from appcert , as an example of
possible extensions to abicheck , is a useful check for static linking
of system archive files (e.g. libc.a or libsocket.a). The practice of
statically linking system libraries into application binaries is not
good with respect to binary stability since the "old code" (from the
archive) that is bolted into the application may fail to work properly
when moved to newer or upgraded systems. The use of static linking
of non-co-shipped libraries is strongly discouraged from the binary
stability standpoint.
Here is example output from abicheck :
# uname -a
SunOS abi 5.8 Generic sun4u sparc SUNW,Ultra-1
# abicheck reader myclient gdate
reader: PRIVATE: (libc.so.1 :SUNWprivate_1.1) _select
myclient: STATIC_LINK: libsocket.a
myclient: STATIC_LINK: libnsl.a
gdate: OK
This output indicates the application "reader" has latched onto a
direct call to the private interface, _select() . It should be calling
the published interface select(3C). The application "myclient" has
statically linked in the networking libraries: libsocket.a and libnsl.a.
The application binary "gdate" had no problems detected by the tool and
so gets an "OK".
Here is some analogous example output from Redhat Linux 6.2:
# abicheck reader myclient /bin/date
reader: PRIVATE: (libc.so.6 :GLIBC_2.1) __poll
myclient: STATIC_LINK: libc.a
/bin/date: OK
One important issue with respect to Linux is that currently no
shipped libraries (i.e. libraries in a distribution) have collected
the private symbols into a version set with a private label (e.g. there
is no GLIBC_PRIVATE for the GLIBC library package). GLIBC is the only
library package on Linux that has non-trivial library versioning, and
so currently abicheck has hard-wired in the criterion used in GLIBC
libraries that a leading underscore "_" indicates a private symbol.
There are a number of exceptions to this rule, and so abicheck currently
carries along an exception list. We hope in the future a GLIBC_PRIVATE
set will be established in the GLIBC library package.
It should be emphasized that abicheck is an initial tool provided as
an example of what can be done with public/private library versioning
in place. In principle the checking that it does could even be moved to
the dynamic linker itself thereby making the abicheck script obsolete.
The important thing we feel is to spread meaningful library versioning to
many library packages beyond GLIBC (e.g. X11 and GNOME) and to also adopt
the private symbol set classifying of library-package internal symbols.
Then tools that use the library symbol versioning information become more
useful in checking for binary stability and also place a useful structure
for library interface definition and maintenance and local scoping in the
library package source tree. As API-providing libraries become larger
and more complicated (and also more numerous) it will likely pay-off to
have this infrastructure in place.
Discussion:
The process described here (library symbol versioning using "versioning
mapfiles" including the creation of a SUNWprivate set for Solaris
internal interfaces) has been in place in Solaris for a number of years.
Currently, (Solaris8) nearly all of the libraries shipped in Solaris
(including library sources imported into Solaris: e.g. X11, Motif,
and CDE) have this interface definition practice. It is not absolutely
clear how much Free and Open Source library development projects will
benefit from this type of practice. However, we feel it is likely they
will benefit a great deal and certainly believe practices of this sort
are worth looking into.
An interesting distinction comes about in that Solaris is basically
shipped as a monolithic blob, whereas Open Source operating systems
tend to be much more modular with respect to their ABI's. This is
true in principle at least, since it is likely most end-users install a
particular distribution (that is also a monolithic blob). In any event,
the interesting possibility exists that there will be more fruitful
ways to create the library interface definitions and versioning than has
been done for Solaris. We feel a namespace separation based on library
package name (e.g. GNOME_1.4, GNOME_PRIVATE; GLIBC_n.m.l, GLIBC_PRIVATE)
is a useful generalization beyond what is done in Solaris. Additional
practices may be discovered as useful.
Also interesting is the possibility of "softening" the public/private
distinction. As an API is going through a rapid initial growth phase it
may be useful to have three categories of symbols: Public, Evolving,
and Private. The public ones are set in stone and the library package
is committed to maintaining their compatibility, and as before the
Private ones are internal-only. The "Evolving" category would contain
experimental interfaces. These may change incompatibly (e.g. by changing
their arguments, their behavior, or disappear entirely). A developer
concerned in producing stable applications should work to stay away
from the Evolving interfaces until they are stabilized and have been
moved to the Public set; on the other hand a developer needing to take
advantage of the experimental interface may decide the potential binary
incompatibility for his distributed application is worth the risk.
Extensions:
We feel the plan outlined above is the main message of this project and
we hope that the practices will be adopted, suitably modified, by open
source projects.
There are, however, some interesting extensions one can apply to the
library interface definition practice (in the spirit of IDL) that provide
useful by-products.
Rather than maintain regular mapfiles/version-scripts as described
in detail above, the library package source tree can maintain simple
ASCII repository files (one per library, say) that contain additional
information about the library interfaces. The repository will have a
number of different fields for each interface, for example:
- Function signature (i.e. return value type, named
arguments and with their associated types).
- Data variables have the variable type and length recorded.
- The list of needed C header files (in, say, format)
associated with the interface and any required libraries
(in, say, -l format).
- The name of the library version (in the sense discussed in the
previous sections) the symbol resides in (e.g. GLIBC_2.2.1
or GLIBC_PRIVATE)
- Any architecture related information (e.g. a list of
different architectures on which the symbol is present).
- Conditions that indicate an exception has occurred when
the interface is called (e.g. return value is NULL).
- The list of the errno's associated with the interface.
- Whether the symbol is associated with any interface
aliases (i.e. weak symbols)
- Descriptions or comments about the interface.
and other information. By having such a repository for each library a
library developer has a central location to go to when he adds a new
interface to the library (or if there are changes to an interface).
The central location provides the definition of the interface: one
does not need to go rummaging around in header files and C source to
find the interface definition and other information.
Some applications using the above sort of fields come to mind:
- Automatic generation of (some of the) documentation
for the interface (e.g. manpages typically document function
signatures, required libraries and header files, interface
exceptions and errno's).
- Generation of mapfiles/version-scripts used at the library's
build time to record the versioning information.
- Creation of comparison tools to detect incompatible changes
to public interfaces in libraries (e.g. a tool to compare the
signatures of interfaces in an earlier release or build with
the current build).
- Use of the function signatures and other information to
create lint and debugging versions of the library.
- Use of the function signatures and other information to
create dynamic tracing and "pretty-printing" of calls to the
library interfaces. This can be used by developers to debug
their applications, and also by end-users to troubleshoot
problems encountered in the field.
Most of the libraries that are shipped with Solaris are maintained with
an interface repository scheme like that described above. These are
used in the Solaris build to create mapfiles that are passed to ld(1)
to record library versions, monitor interface incompatibilities, and
to generate code used to create tracing utilities for the apptrace(1)
tool that is available on Solaris 8 and later.
apptrace uses the link-auditing [5] feature of the Solaris dynamic linker
that allows library bindings to be intercepted and replaced by bindings
to one's own functions. This mechanism is in the spirit of LD_PRELOAD
schemes but is more flexible. The apptrace interceptors for the Solaris
libraries act as wrappers for each function call that pretty-print the
function's arguments and return value along with calling the actual
Solaris library function (so that the application proceeds as normal).
It is a good deal faster than tracing tools such as truss(1) and strace(1)
that require breakpoint services from the kernel.
Interface call interception coupled with the information in the library
interface repository can be generalized to a number of interesting
applications, such as creating convenient library "unit tests" and
also application fault injection tools (i.e. faked error conditions are
passed up to the application). In principle, one can imagine recording
much if not all of the repository information into the shared libraries
themselves so that information will be available for, and encourage the
development of, library interface (ABI) related tools.
References:
[1] http://www.gnu.org/software/libc/
[2] For convenience we will use the short term "Linux" to really mean a
Linux kernel based operating system. I.e. a system composed of much free
software (e.g. GNU, XFree86, ... etc.) placed around the Linux kernel to
yield a complete operating system. Examples include Debian GNU/Linux,
Redhat Linux, and SuSE Linux.
[3] http://www.linuxbase.org/
[4] New functionality can, of course, be added to existing interfaces.
For example, by passing new values of a parameter to the interface.
The behavior of the existing interface cannot change in a way that breaks
existing applications.
[5] The Solaris Linker and Libraries Guide may be found in the
documentation collection at: http://docs.sun.com/ab2/coll.45.13
[6] This is analogous to setting the LD_BIND_NOW environment variable
(i.e. lazy binding is turned off) and looking for unresolved symbols,
however the versioning scheme has no measurable impact on application
performance.
[7] See the URL in [1] and the "Commands" -> "Version Script" section
of the GNU linker "ld" info page (e.g. /usr/info/ld.info*)
[8] This behavior can be overridden via setting an environment variable
LD_NOVERSION on Solaris.
[9] On both Solaris and Linux abicheck runs the ldd(1) command with
environment LD_DEBUG="files,bindings" to retrieve the dynamic linker's
information about dynamic symbol bindings. Additionally, it will run
pvs(1) , dump(1) , and elfdump(1) on Solaris and objdump(1) on Linux to
extract additional information about the binaries.
[10] Shared objects may also be checked, but this can often lead to
difficulties if not enough information is recorded in the shared object.
[11] http://www.sun.com/developers/tools/appcert/
[12] http://www.usenix.org/publications/library/proceedings/als2000/full_papers/browndavid/browndavid.pdf
|