Next: Database Queries. Up: Global Surface Database Previous: Satellite down link

Data processing requirements.

The compressed data stream produced by satellites is bits per day. Two steps need to be taken to convert this information into a form appropriate for queries: (1) some signal processing of the image data needs to be performed, and (2) the image at each grid point needs to be converted into information indicating the type of ground cover (wheat, rice, rock, etc.), and the state of the ground cover (dormant, germinating, ready for harvest, etc.). This final form we call a model, which consists of an identifier and a state. It is this form that is of most use to those making queries of the database. Since perfect matches with models will be rare, the database will need to associate with each grid point several ``closest matching models.'' For simplicity, we assume that there will be different identifier/state combinations, and that closest matching models will be retained for each grid point.

The resulting data rates generated by the satellites are summarized in Figure 4.2. Notice that if the down link rates per satellite were increased to bits per second, nearly every grid point could be sampled each day.

The signal processing typically will require computing the Fourier transform of the image and performing some filtering operations on the resulting set of spectral coefficients. Using some distance metric, these filtered spectral coefficients then are compared against a table of spectral coefficients for different identifier/state combinations and ``closest matching models'' are selected.

For simplicity, we assume that the signal processing required is a small integer multiple of the cost of performing a Fourier transform of the image attached to each grid point. Each of these images is a array of intensities. Assuming an point fast Fourier transform takes floating point operations, the signal processing required per grid point is FLOPS. Since the satellites can transmit images per second, we will need roughly floating point operations per second just to perform the basic signal processing. This computation, however, is highly parallelizable and requires little inter-processor communication, since each image can be processed independently.

For template matching, we need to match against total models. Each grid point consists of spectral coefficients, and the distance between these coefficients and the corresponding coefficients of each model must be computed. Using the simple minded approach of computing the distance between the image and each of the models will require a total of operations for each grid point. Since images are provided by the satellites each second, comparisons will be required per second. Without improving the comparison algorithm, this dominates the signal processing cost. Again this computation is highly parallelizable. Storing the data required by the closest matching models will require bits.

A database consisting of one set of models per grid point would require bits. The satellites will produce model bits per day (since not every grid point is modeled each day). Thirty years of modeled data will require bits of modeled data, or one petabyte.

It is not sufficient to preserve the transformed (modeled) versions of the data. Over time, improved templates will be developed and it then will be useful to re-do the template matching. Each day, the data produced by the satellites in compressed grid point images is bits. The raw data produced by the satellites over a 30-year period is at least bits, or one exabyte.



Next: Database Queries. Up: Global Surface Database Previous: Satellite down link


gcf@npac.syr.edu