Dr. Dobb's Web Site
                                      
     _________________________________________________________________
                                      
     [ Home | Articles | Source Code | Book Reviews | Index | CD-ROMs |
                       Careers | Op-Ed | Subscribe ]
                                      
     _________________________________________________________________
                                      
                           Research Systems, Inc.
                                      
              A Memory-Constrained Image-Processing Architecture
                                       
   July 1997 Dr. Dobb's Journal
   
  by Mayur Patel
  
     _________________________________________________________________
                                      
   Sidebar: A Macro Technique for Managing Types
     _________________________________________________________________
                                      
   [INLINE] Image-processing applications are hungry for both processing
   time and memory. CPU demand typically does not pose a problem, since
   most operating systems' schedulers can adequately balance the needs of
   competing applications. However, virtual memory is the only service
   provided by the operating system to deal with the memory load.
   Allowing an image-processing application to swap memory pages to disk
   impairs all the activity on the machine.
   
   A good image-processing library should deal with this memory problem
   with minimal supervision from you. In this article, I present an
   image-processing architecture that gives you a reasonable degree of
   control over memory consumption.
   
   Tile On Demand
   
   Tile-based processing consists of dividing an image into small,
   rectangular pieces called "tiles," processing each tile, and
   reassembling the image. The principle of locality applies, so an
   image-processing operation usually requires input tiles of
   approximately the same area as the output tile. Clearly, processing
   each tile requires less memory than processing the entire image.
   
   Many existing image-processing libraries support scan-line-based
   rather than tile-based image processing. Scan-line libraries are easy
   to implement and often perform efficiently because most image file
   formats store data in scan-line chunks. However, a tile-based approach
   retains the same advantages (scan-lines are simply tiles with a height
   of one pixel) and is more general.
   
   One case where you benefit directly from flexible tile size is with
   transform-based image compression. Many popular Discrete Cosine
   Transform-based compression algorithms require 88 tiles. With
   scan-line-based libraries, you must construct 88 units by writing
   eight scan lines into a buffer and moving data back and forth from the
   buffer accordingly. This is unnecessary busy work; it is better to
   provide tiles in whatever proportions you request.
   
   Another convenient feature is lazy evaluation. Processing on demand
   reduces the expense of image processing and promotes interactivity.
   For example, users may reorder the sequence of operations several
   times before explicitly requesting an output image. Users may also
   require only a small portion of the output image or may extract data
   somewhere in the middle of the pipeline. Such requests can be serviced
   in less time than it takes to produce the entire output. Demand-driven
   processing helps to minimize wasted computation in these situations.
   
   Figure 1
   [LINK] In this framework, data requests propagate from the output
   toward the input (see Figure 1). The input satisfies those requests
   and triggers the appropriate computation, and the result flows back
   toward the output. Along the way, operations are applied to the data
   according to the dependency information already supplied through the
   application. By the time the data reaches the output, all the
   specified operations have been performed.
   
   Coupling demand-driven execution with tile-based processing yields an
   environment with great potential. Certainly, these features carry with
   them nontrivial overhead. However, they also work together to reduce
   the total amount of memory required to process images. The extra
   cooking time is often worth the wait.
   
   Object Interaction
   
   This tile-based image-processing architecture is inherently object
   oriented. The interaction between three fundamental classes-Image,
   ImageTile, and ImageMan-is the foundation of the system.
   
   Tiles constitute the interface between the library and the
   application. You need not be aware of the internal representation of
   the image as long as the tile implementation is sufficiently general
   to handle the widest range of cases.
   
   Applications request pixels in two ways. The more convenient interface
   is Image:: newTile(), which allocates a new tile and fills it with
   image data. The other interface is Image::fillTile(), which accepts an
   existing tile for processing and overwrites its contents. The API is
   as simple as that.
   
   Figure 2
   [LINK] Behind the scenes, the image interacts with the image manager;
   see Figure 2. Image::newTile() requests (which go to the image
   manager) let the image manager provide a host of services, including
   caching. The manager eventually comes back to the image and uses its
   Image::fillTile() method to compute data.
   
   In the obvious implementation of this interaction pattern, there is
   one inefficiency. When the system receives a newTile() request, it
   immediately allocates memory. Image::fillTile() then pulls input data
   into scope for local processing. The fillTile() method most likely
   uses the newTile() interface to request data, so memory is allocated
   at every step as requests propagate backward. These allocations aren't
   actually needed until the data begins to flow forward.
   
   A more constrained approach allocates memory only after the input data
   is available, but I prefer the former behavior for two reasons:
   
     * It supports a flexible interface between the image and image
       manager.
     * It supports a simple developer interface for the implementation of
       real image processing operators.
       
   Implementation
   
   The tiles in my architecture use the (X,Y,Z,C) image-space model. The
   X component represents the width, Y represents the height, C
   represents the color channel, and Z the frame number. This model lets
   you describe images of any number of channels. It also lets you
   describe a series of images as a sequence, which is useful for
   processing digital video or film since an animation is expected to
   consist of several frames, each with a uniform width, height, and
   channel count.
   
   Figure 3
   [LINK] The coordinate (x,y,z,c) is sufficient to specify the smallest
   datum. In addition to these coordinates, my tile structure (see Figure
   3) contains width and height elements that make it possible to
   represent a one-channel area from an image. You can access pixel data
   using methods provided by the tile class.
   
   Different applications will require pixels of different precision. For
   example, 8 bits per color channel commonly represents image data. For
   higher quality work, however, 16 bits per channel is more useful. In
   some cases, particularly in scientific computing, 32-bit
   floating-point values (0.0 to 1.0) best represent each plane. This
   implementation supports all three options. A method allows typecasting
   the data between supported types.
   
   Granted, using macros in C++ is generally bad programming style.
   Likewise, cute macro tricks in C also represent poor programming
   style. However, I have used both to encapsulate data-type support. Any
   technique, with disciplined execution, has its place. In this
   situation, a new data type may be incorporated with the addition of
   only one function. No changes to existing code are necessary. You can
   judge the case for yourself. This approach is discussed in more detail
   in the accompanying text box entitled, "A Macro Technique for Managing
   Types."
   
   The tile class inherits the properties of a referenced object. Smart
   pointers may be used to organize tiles at the application level. The
   method Tile::deleteTile() is included to complement the
   Image::newTile() method. Image::newTile() provides construction
   services; Tile::deleteTile() provides dereferencing and destruction
   services.
   
   My image manager implementation is generic, providing the minimum set
   of services required to meet its obligations. At most, only one image
   manager may exist at any time. An auxiliary class regulates access.
   The Singleton design pattern applies well to the image manager, making
   sure only one instance of a class exists at any time. Blending the
   referenced-object pattern with the Singleton pattern also ensures that
   the image manager is deallocated when it is not required. This is
   useful if image processing represents only a small portion of the
   total application. Attaching an image-manager reference to images
   through the base class ensures that the image manager is instantiated
   during the lifetime of any of its potential clients.
   
   I provide two base classes for images: InputImage and SampledImage.
   InputImage best suits images (described as mathematical functions)
   that may have variable resolution or unlimited range. SampledImage
   best suits images of finite resolution and range.
   
   SampledImage uses the Strategy design pattern to deal with
   out-of-range behavior. The Strategy design pattern encapsulates
   implementations of an algorithm so that the algorithm isn't hardwired
   into a class. A SampledImage may receive a request during a
   convolution calculation for data beyond its range of valid samples.
   Fill-strategy objects may be attached to SampledImage to determine how
   this area is validated before the tile is returned to the client.
   
   I don't use a base class for the output image. Since the output can
   flow to the screen or printer as easily as to a file, the output
   mechanism will often depend upon application-specific code.
   Additionally, the demand-driven characteristic of the architecture
   makes it easy not to place constraints on the output.
   
   Example Applications
   
   Listing One is a test program that enhances the contrast of an SGI RGB
   file. The program first specifies data dependencies using class
   constructors. The first object opens an SGI RGB-format file. The
   second object executes the contrast-enhancement operation. The last
   object writes the data to an output file, possibly converting the data
   type. Once these dependencies are defined, instantiating the output
   image triggers processing of this simple chain.
   
   SGI RGB files are both powerful and flexible, supporting 8- or 16-bit
   integer channels and any number of color channels. As you can see from
   the Contraster class (see Listings Two and Three), implementing new
   operators is straightforward. Contraster consists of a constructor, a
   destructor, and a fillTile() implementation. Developers of new
   image-processing operators are well shielded from the image manager
   and other system complexities.
   
   The Contraster object executes the contrast-enhancement operation
   using floating-point math. I did this for two reasons. First, it
   demonstrates how easy it is to typecast the data in tiles. (But be
   warned: Typecasting is as expensive as any pixel-processing
   operation.) Second, using floating-point math makes it easy to deal
   with over- and under-exposure problems. Integer values may overflow or
   underflow, whereas floating-point values retain data integrity. There
   are ways around integer saturation, but I show floating-point in
   action for the sake of example.
   
   Extending the Library
   
   Since my image-manager implementation doesn't take full advantage of
   the flexibility of the architecture, it is a good place to start
   extending my implementation. One of the first features that should be
   incorporated into the image manager is a tile-caching mechanism.
   Imagine a chain of convolution operations executed using the current
   implementation. As the requests for tiles propagate toward the input,
   the size of the requested area becomes larger. Over time, these
   requests will result in excessively redundant calculations.
   
   A cache minimizes redundancy by storing data that is perceived to be
   frequently used. Although the cache requires more RAM-one of the
   constraints you're trying to overcome-you can control how much RAM it
   uses. Giving some RAM back to the processing engine for caching is a
   guard against extremes in the space-time tradeoff. Remember, caching
   operator data saves more than the computation at that node in the
   pipeline. It saves computations at every node up to and including that
   operation.
   
   How you implement the caching mechanism depends on the execution
   characteristics that you want the library to have. You can cache
   individual tiles, color channels, or entire images. You may need
   predictive caching or reactive caching. Matching your needs with a
   cache implementation is well worth the development time.
   
   Earlier, I described why I have two image base classes. The InputImage
   encapsulates infinite-range or infinite-resolution data. The
   SampledImage encapsulates traditional image representations. If the
   caching technique you prefer depends on the image's resolution or
   range, you will not be able to cache InputImage images. If so, then
   you may add InputImage support by making InputImage's newTile() method
   defer to the image manager. This gives you more flexibility in
   experimenting with new caches.
   
   Once you have implemented a cache, it would be nice to offer forced
   caching. Some image operators, such as a Fast Fourier Transform (FFT),
   are not localized. To execute an FFT, it is easiest to process the
   entire image at once. And it isn't cheap. Forced caching lets you
   identify and address hot spots for the caching system, such as these
   nonlocalized operators. Without forced caching, or if there isn't
   enough room to cache the entire FFT image, each tile request arriving
   at the FFT image will likely result in recalculation of the entire
   image.
   
   Another characteristic of the current image manager is that tile sizes
   are not regulated. Currently, the application is on its honor not to
   request more data than it actually needs at one time. Poorly designed
   applications may request the entire image in one call. The current
   image manager will propagate this request. This defeats the purpose of
   having an architecture designed to constrain memory demand.
   
   Request decomposition allows the image manager to decompose requests
   for large tiles into several requests for smaller tiles. Processing
   each smaller tile requires less memory than the whole. Another benefit
   of request decomposition is that the requests for the many small tiles
   may be executed in parallel. Giving the image manager the power to
   dispatch concurrent requests empowers the entire system with
   concurrent execution behavior. The developer of image-processing
   operators doesn't need to worry about writing parallel code.
   
   Conclusion
   
   The source code accompanying this article (see "Availability," page 3)
   implements the framework of this architecture. It is written in C++
   and should be fairly easy to port. I encourage you to develop the
   architecture into an image-processing library custom fitted to your
   specific environment. For small systems, it should allow processing
   pipelines of considerable complexity. For multiprocessor systems, it
   should allow multiple renders to coexist without malicious contention
   for resources.
   
   The Silicon Graphics Imagevision Library provides a superset of all
   the features I describe here, and I recommend that IRIX users evaluate
   it. However, it is not available for other platforms.
   
   Acknowledgment
   
   Thanks to Bob Amen and Cinesite Digital Film Center in Hollywood,
   California, for making equipment available for testing.
   
   DDJ
   
     _________________________________________________________________
                                      
   Copyright � 1998, Dr. Dobb's Journal
   Dr. Dobb's Web Site Home Page -- Top of This Page