Poll

Are you considering working with an UAV for surveying?


Spacer
Archive
Archive > December 2005, Volume 19, Issue 12 > Data Compression

Data Compression

  28/11/2005
This issue of GIM International presents a Product Survey on mobile mappers: small, lightweight, handy instruments for gathering data and integrating the collected data with digital maps. But how is all this stuff stored in a tiny mobile mapper? Data compression is the magic spell! I know what it is, I know what it does, but how does it work?
Henk Key, contributing editor and Mathias Lemmens, editor, GIM International

Data compression shrinks the size of a dataset so that more data can be stored on the device or transmitted faster over a communication channel than would have been possible without compression. Data compression is used in two main types of applications: data transmission (like communication by voice over the phone), and storage of, for example, raster data or coordinates within a GIS envir-onment.


It was Claude E. Shannon who in 1948 formulated the theory of data compression, both ‘lossless’ and ‘lossy’, in a paper entitled A Mathematical Theory of Communication. In lossless data compression the compressed data after decompression is an exact replica of the original. Reducing or eliminating redundancy, such as repeated patterns, in the data achieves data reduction. In lossy data compression the decompressed data need not be exactly the same as the original, often it suffices to have a reasonably close approximation, so that a certain degree of distortion is allowed between the original and decompressed data. WinZip creates lossless compression; the often-used JPEG algorithm is an example of lossy compression.

Lempel-Ziv-Welch
One of the most popular data compression programs used by mobile mappers is GZIP, which is based on the Lempel-Ziv-Welch (LZW) algorithm. LZW is one of the most common algorithms used in computer graphics, and is also used in popular programs such as ZIP and PKZIP). LZW is a general lossless compression algorithm capable of working on almost any type of data. The algorithm identifies patterns of data (blocks) in the dataset. A particular block is matched to the block in an indexed dictionary. When a match is found, the index of the substring involved is written to the output file.


No dictionary of blocks is available at the onset of compression as this created along redefined lines during the compression process: if the identified block is not present in the dictionary, it is written in and provided with an index. The newly created index is also written to the output file. During decompression the index pointer is read from the compressed file, and if the index pointer is not yet in the data dictionary it is written there. The index is then translated into the block it represents and written to the uncompressed output file. Since the decoder knows that the last symbol of the most recent dictionary entry is the first symbol of the next parse block, possible conflicts during decoding are avoided.


LZW does not require preservation of the dictionary to decode the decompressed file, which again saves storage place. This is accomplished by building up the dictionary along predefined lines making use of a priori knowledge of the data. Suppose that the data consists of a random alternation of two symbols from the alphabet: a and b. First the dictionary is initialised to contain all one-symbol blocks, in this case this is two (a,b) indexed as pos 0 and pos 1, then the combination ab (pos 2) is recognised, next bb (pos 3), followed by the combination ba (pos 4), and so on. If we decompress the string 011024… the character-combination is as follows abbaabba…

Coordinates
In data files containing coordinates patterns will obviously be found because the file is usually limited to a part of a country, a province or local government area. Accordingly, the first series of numbers of the X and Y coordinates are identical for all coordinates and can thus be replaced by an index referring to its position in the dictionary. As a result, the compression ratio of coordinate files with the LZW algorithm is very high. (Compression ratio is the ratio of the size of the original data to the size of the compressed data. For example, when a coordinate file is originally represented by six million bytes and after compression by one million bytes the compression ratio is 6:1, or 83.3%.) Theoretically, the size of the dictionary can grow to infinity, but in practice the size is limited to a recommended length of 4,096 blocks (12 bits); no more entries are added after this limit has been reached.





     


Comments (0):
There are no comments yet.
Make your comment:
Name:
Your comment:
Type over the 2 words (or number) from the picture
 
Most Popular articles Most Popular News Most Popular Jobs
Spacer


Spacer
Spacer
Spacer
 

Interactive


3D Scanning of Historic Sugar Factories

The Alliance for Integrated Spatial Technologies at the University of South Florida, USA, recently worked with the Florida Park Service on a project to document the remains of several historic sugar-mill sites in the State Parks to create as-builts to be used in preservation and conservation of these resources. The FARO LS 880, along with GPS and total station georeferencing and colour imaging, was used on these projects. 

 

 Last 5 items:
 3D Scanning of Historic Sugar Factories
 Road Improvement Survey with UAV
 3D BIM + money = 5D
 Setting up a survey in a swamp
 Launch of the 9th Baidu Satellite
 
Spacer
Spacer
Spacer
Spacer
Spacer