Güralp Systems sensors and data modules use Güralp Compressed Format (GCF) to share seismic data. The format can be used for data storage or transmission over a serial link or TCP/UDP network.
This section describes the specification of GCF format.
A GCF file or stream consists of a sequence of blocks, which can be up to 1024 bytes long. Each block consists of a 16-byte header followed by either
a series of data records, containing initial and final sample values and a sequence of first differences between intervening sample values, or
status information in ASCII text format.
The format of the block's body is determined by information in the header.
The header is 16 bytes long, split into four 4-byte fields:
System ID : If the top bit of this field is unset, the bottom 31 bits of this field specify a label of up to 6 characters identifying the originating system, encoded as a base 36 number. The base 36 digit in each position corresponds to a single character (0 – “0” … 9 – “9” , 10 – “A” … 35 – “Z”), with the least significant digit placed at the right-hand end of the string.
If the most significant base 36 digits are zero, they are omitted from the string; hence the encoding 1 corresponds to the string 1, not 000001.
For example, the string HPA1 would be encoded as the number (17 × 36 × 36 × 36) + (25 × 36 × 36) + (10 × 36) + 1 = 825913.
The field is a signed 31-bit positive integer, allowing System IDs up to ZIK0ZJ.
If the top bit of this field is set, the bottom 26 bits of this field specify a label of up to 5 characters using the same encoding. In this case, the field is a signed 26-bit positive integer, allowing System IDs up to ZZZZZ. Bits 26 – 30 of the System ID field are reserved in this case.
Stream ID : a unique 6-character label identifying the device, component and sample rate:
Device serial number
The 6-character label is encoded in 4 bytes as a base 36 number, in the same manner as the System ID, above.
Date code : The date and time when the data in the block begin, expressed as a 32-bit number where
the bottom 17 bits are the time in seconds since midnight, normally between 0 and 86399 but possibly 86400 or 86401 in years with 'leap seconds'; and
the top 15 bits are the Güralp day number, with day zero on 17 November 1989. The day number increments when the seconds number rolls over at midnight.
Data format : The final header element contains 4 bytes defining the format of the data in the block:
Number of data records
Sample rate : The sample rate of the data records, an integer number of samples per second. If this field is zero, the block body contains status information in ASCII text format.
Compression code : A code for the compression format used for all the data in the block. Currently accepted values are:
Data records contain one 32-bit difference
Data records contain two 16-bit differences
Data records contain four 8-bit difference
Other values are reserved for future expansion.
Number of data records : The number of 32-bit data records in the block. Combining this with the compression code enables you to find out the total number of sample points in the block; dividing by the sample rate allows the time duration of the block to be determined. You should check for status blocks (which have Sample rate = 0) before performing this calculation. For status blocks, the number of characters is Number of data records × 4.
The block duration is always a whole number of seconds, and always starts on a whole second boundary. A data block has a maximum size of 1024 bytes. If the compression algorithm changes the compression format, blocks may appear which are not filled to the maximum capacity.
The rest of the block contains the data fields.
The body of a GCF block contains
the absolute value of the first sample, a 32-bit signed integer referred to as the Forward Integrating Constant or FIC (this is not a data record, and is not counted in the Number of data records);
a series of differences between following samples—32-bit, 16-bit or 8-bit signed integers according to the Compression code;
the absolute value of the last sample, a 32-bit signed integer, referred to as the Reverse Integrating Constant or RIC. The RIC can be used as a checksum, as it should match the last decompressed sample value. It is not a sample in itself.
Because the value of the first sample is used as the FIC, the data record immediately after it is always zero.
Note: When data from a 24-bit digitiser is compressed into GCF blocks and the digitiser selects compression code 1 (32-bit data records) and the block is being transmitted over a serial link, the digitiser may truncate the 32 bit differences to 24 bits by omitting the top byte of each difference. This does not apply to true 32-bit data, even for blocks where the differences would fit into 24 bits.
This optimization is made purely to use the serial link more efficiently, and is not part of the GCF file format. When such data are stored on a disk or transmitted over a network using TCP or UDP, they must be expanded back to the full 32 bits.
Because 24-bit values have a range of 16 million (±8 million), differences between them must be able to indicate ±16 million, which is 25 bits. However, the serial compression code discards the 25th bit. When decompressing the data, you will need to reconstitute this bit.
You can check that you have correctly reconstituted the sign bits by comparing the value you obtain at the end of the GCF block with the RIC.
GCF blocks can be embedded in TCP or UDP packets for transmission over a network to other Güralp instruments or computers running GCF-compatible software such as Scream!.
A GCF server is expected to understand control messages, and to respond with data as appropriate. This allows clients to start and stop data transfer and to request that any missing blocks be re-transmitted.
The default behaviour of a GCF server is to listen for UDP packets containing GCF commands. A client sends a command as a null-terminated string in a single UDP packet. The commands a client can issue and the expected responses are:
GCFPING : acknowledge this packet;
GCFSEND:B : start sending data in big-endian ("network") byte order;
GCFSEND:L : start sending data in little-endian ("Intel") byte order;
GCFSEND : start sending data, in default byte order (Scream! interprets this as GCFSEND:B).
The server responds to any of these with a packet containing the string GCFACKN, again null-terminated.
A client should continue issuing GCFSEND packets periodically while it still wants to be sent data. If a client does not issue a GCFSEND packet for a long period, the server will stop sending data to it.
If the server has active clients when it shuts down, it sends a packet containing the string GCFNOSV to all of the clients.
Once the client has sent a GCFSEND packet, the server will start sending it GCF blocks, each in a single UDP packet. There are two formats used, “Version 31” and “Version 40”.
1024 octets – The GCF block as described above.
1 octet – The version number (decimal 31)
1 octet – The length of the following string, in octets
32 octets – A string describing the source of the data, padded to length 32 with zeros. The string takes the form STREAM-ID/COMxx/HOSTNAME where COMxx is the name of the serial port (COM1, etc.) and STREAM-ID is as in the GCF block (see above)
2 octets – The sequence number, in the byte order specified below (same as for data)
1 octet – A code for the byte order, 1 = big-endian (default), 2 = little-endian
1024 octets – The GCF block.
1 octet – The version number (decimal 40)
1 octet – A code for the byte order, as above
2 octets – The sequence number, in the byte order specified
1 octet – The length of the following string, in octets
48 octets – A string describing the data source, as above but padded to length 48 with zeros.
Because the UDP packets have sequence numbers, the client can detect when a packet has been lost. To recover this data, it initiates a TCP connection to the server, on the same port number. If the server accepts the connection, the client sends a single-byte command to the server. The commands available are:
0xf9 (249) : Requests the server to use TCP only when communicating with this client. Subsequent GCF data packets will be sent over the active TCP connection, with the same format as the normal UDP packets.
This feature is supported by the server in Scream! 4.0 and above.
0xfc (252) : Requests a version string, returned in PASCAL format (i.e. first octet = length, remainder = null-terminated string)
0xfe (254) : Requests the oldest sequence number held by the server. The sequence number is returned as 2 octets in big-endian byte order
0xff (255) sequence-number : Requests a block by sequence number. The block is returned on the TCP connection in one of the above formats. If the block is not available, the server returns 0xff 0xff 0xff 0xff. The sequence number is 2 octets long, with big-endian byte order.
When the client has all the blocks it needs, it is free to close the connection.
Before it can be sent over a serial link, each GCF data block must be packaged in a 'transport layer', which consists of a 4-byte transmission header and a 2-byte checksum tail.
The transmission header consists of 4 bytes:
ASCII G (0x47)
Block sequence number
Block size (MSB)
Block size (LSB)
Block sequence number : An unsigned integer, which increments by 1 after each block, wrapping round to 0 from 255.
Block size : The size of the block in bytes, excluding the transmission header and checksum tail, but including the GCF block header.
After the transmission header, the GCF block is sent (with a length equal to the number of bytes specified in Block Size), followed by a two-byte checksum. This value is the sum of all bytes in the block header and body, modulo 65536, and is presented in big-endian byte order.
To optimise the use of available transmitter bandwidth the transmitted data block is truncated to the actual data length. This reduction is only applied to the difference records - the first and last absolute values are still transmitted as 32 bit values.
After transmission of the block is complete, the transmitter waits a nominal 150 ms for either a NACK or ACK signal before starting to transmit the next block.
If it receives an ACK, transmission of the next block commences immediately. If NACK is received, the block is re-transmitted.
The receiver should read the block, calculate the data checksum, and compare that with the value sent in the transmission header. If the two values match, the receiver should send an ACK signal, otherwise a NACK.
ACK and NACK signals can be in two formats: an older 2-byte format and a newer 6-byte format. The 2-byte signal is simply the first 2 bytes of the 6 byte signal. When both the transmitter and receiver use the 6 byte version, the link is said to be operating BRP (Block Recovery Protocol), which allows recovery of lost data (from up to 255 blocks ago) as well as error correction on a per-block basis.
The format of an ACK or NACK message is
Stream ID LSB
^S (0x13) or 0
Stream ID MSB
Stream ID LSB
Stream ID MSB
Stream ID should match the 4-byte stream ID of the last received block. Each system is able to identify its own ACK or NACK by matching this ID. 2-byte systems only match the least significant byte and ignore the rest.
For a NACK message, Block number specifies which block number to rewind to. For an ACK message, the third byte is either unused (0), or 0x13 (ASCII DC3, XOFF or +) to switch to command mode.
DM24 digitiser units transmit data as complete blocks become available, without any flow control apart from ACK and NACK.