Summary

The ATSC Digital Television Standard (Document A/53) defines the complete technical framework for digital HDTV broadcasting in the United States, covering video, audio, transport, and RF transmission systems. The standard specifies MPEG-2 video compression, AC-3 audio coding, 8-VSB modulation, and Reed-Solomon error correction for terrestrial broadcast.

Source document circa 1995 preserved as-is
Doc. A/53

12 Apr 95

16 Sep 95

ATSC Digital Television Standard

Blank Page

ATSC Digital Television Standard

Table of Contents

List of Figures vi

List of Tables vii

FOREWORD 1

1. SCOPE & DOCUMENTATION STRUCTURE 1

1.1 Scope 1

1.2 Documentation structure 1

2. REFERENCES 1

3. DEFINITIONS 2

3.1 Definitions 2

3.2 Compliance notation 2

3.3 Treatment of syntactic elements 2

3.4 Terms employed 2

3.5 Symbols, abbreviations, and mathematical operators 9

3.5.1 Introduction 9

3.5.2 Arithmetic operators 10

3.5.3 Logical operators 10

3.5.4 Relational operators 11

3.5.5 Bitwise operators 11

3.5.6 Assignment 11

3.5.7 Mnemonics 11

3.5.8 Constants 11

3.5.9 Method of describing bit stream syntax 12

3.5.9.1 Definition of bytealigned function 13

3.5.9.2 Definition of nextbits function 13

3.5.9.3 Definition of next_start_code function 13

4. BACKGROUND 13

4.1 Advanced Television Systems Committee (ATSC) 13

4.2 Advisory Committee on Advanced Television Service (ACATS) 14

4.3 Digital HDTV Grand Alliance (Grand Alliance) 15

4.4 Organization for documenting the Digital Television Standard 15

4.5 Principles for documenting the Digital Television Standard 16

5. SYSTEM OVERVIEW 17

5.1 Objectives 17

5.2 System block diagram 18

ANNEX A - VIDEO SYSTEMS CHARACTERISTICS (Normative) 21

1. SCOPE 21

2. REFERENCES 21

2.1 Normative references 21

2.2 Informative references 21

3. COMPLIANCE NOTATION 21

4. POSSIBLE VIDEO INPUTS 21

5. SOURCE CODING SPECIFICATION 22

5.1 Constraints with respect to ISO/IEC 13818-2 Main Profile 22

5.1.1 Sequence header constraints 22

5.1.2 Compression format constraints 23

5.1.3 Sequence extension constraints 23

5.1.4 Sequence display extension constraints 24

5.1.5 Picture header constraints 24

5.2 Bit stream specifications beyond MPEG-2 24

5.2.1 Picture extension and user data syntax 24

5.2.2 Picture user data syntax 25

5.2.3 Picture user data semantics 25

ANNEX B - AUDIO SYSTEMS CHARACTERISTICS (Normative) 27

1. SCOPE 27

2. NORMATIVE REFERENCES 27

3. COMPLIANCE NOTATION 27

4. SYSTEM OVERVIEW 27

5. SPECIFICATION 28

5.1 Constraints with respect to ATSC Standard A/52 28

5.2 Sampling frequency 29

5.3 Bit rate 29

5.4 Audio coding modes 29

5.5 Dialogue level 29

5.6 Dynamic range compression 29

6. MAIN AND ASSOCIATED SERVICES 30

6.1 Overview 30

6.2 Summary of service types 30

6.3 Complete main audio service (CM) 30

6.4 Main audio service, music and effects (ME) 31

6.5 Visually impaired (VI) 31

6.6 Hearing impaired (HI) 31

6.7 Dialogue (D) 32

6.8 Commentary (C) 32

6.9 Emergency (E) 32

6.10 Voice-over (V0) 33

7. AUDIO ENCODER INTERFACES 33

7.1 Audio encoder input characteristics 33

7.2 Audio encoder output characteristics 33

ANNEX C - SERVICE MULTIPLEX AND TRANSPORT SYSTEMS CHARACTERISTICS (Normative) 34

1. SCOPE 34

2. NORMATIVE REFERENCES 34

3. COMPLIANCE NOTATION 34

4. SYSTEM OVERVIEW 34

5. SPECIFICATION 36

5.1 MPEG-2 Systems standard 36

5.1.1 Video T-STD 36

5.1.2 Audio T-STD 36

5.2 Registration descriptor 36

5.2.1 Program identifier 36

5.2.2 Audio elementary stream identifier 36

5.3 The program paradigm 37

5.4 Constraints on PSI 37

5.5 PES constraints 38

5.5.1 Video PES constraints 39

5.5.2 Audio PES constraints 39

5.6 Services and features 39

5.6.1 Program guide 39

5.6.1.1 Master program guide PID 39

5.6.1.2 Program guide STD model 40

5.6.2 System information 40

5.6.2.1 System information PID 40

5.6.2.2 System information STD model 40

5.6.3 Specification of private data services 41

5.6.3.1 Verification model 41

5.6.3.1.1 Syntax and semantics 41

5.6.3.1.2 Ancillary service target decoder (ASTD) 41

5.6.3.2 Stream type and PMT descriptors 41

5.6.3.2.1 Stream type 42

5.6.3.2.2 PMT descriptors 42

5.7 Assignment of identifiers 42

5.7.1 Stream type 42

5.7.2 Descriptors 43

5.7.2.1 AC-3 audio descriptor 43

5.7.2.2 Program smoothing buffer descriptor. 43

5.8 Extensions to the MPEG-2 Systems specification 43

5.8.1 Scrambling control 43

6. FEATURES OF 13818-1 NOT SUPPORTED BY THIS STANDARD 44

6.1 Program streams 44

6.2 Still pictures 44

7. TRANSPORT ENCODER INTERFACES AND BIT RATES 44

7.1 Transport encoder input characteristics 44

7.2 Transport output characteristics 44

ANNEX D - RF/TRANSMISSION SYSTEMS CHARACTERISTICS (Normative) 46

1. SCOPE 46

2. NORMATIVE REFERENCES 46

3. COMPLIANCE NOTATION 46

4. TRANSMISSION CHARACTERISTICS FOR TERRESTRIAL BROADCAST 46

4.1 Overview 46

4.2 Channel error protection and synchronization 48

4.2.1 Prioritization 48

4.2.2 Data randomizer 48

4.2.3 Reed-Solomon encoder 49

4.2.4 Interleaving 49

4.2.5 Trellis coding 50

4.2.6 Data segment sync 54

4.2.7 Data field sync 54

4.2.7.1 Sync 55

4.2.7.2 PN511 55

4.2.7.3 PN63 55

4.2.7.4 VSB mode 56

4.2.7.5 Reserved 56

4.2.7.6 Precode 57

4.3 Modulation 57

4.3.1 Bit-to-symbol mapping 57

4.3.2 Pilot addition 57

4.3.3 8 VSB modulation method 57

5. TRANSMISSION CHARACTERISTICS FOR HIGH DATA RATE MODE 57

5.1 Overview 57

5.2 Channel error protection and synchronization 59

5.2.1 Prioritization 59

5.2.2 Data randomizer 59

5.2.3 Reed-Solomon encoder 59

5.2.4 Interleaving 59

5.2.5 Data segment sync 60

5.2.6 Data field sync 60

5.3 Modulation 60

5.3.1 Bit-to-symbol mapping 60

5.3.2 Pilot addition 60

5.3.3 16 VSB modulation method 60

ANNEX E - RECEIVER CHARACTERISTICS (Informative) 61

1. SCOPE 61

2. REFERENCES TO EXISTING OR EMERGING STANDARDS 61

3. COMPLIANCE NOTATION 61

4. STATUS OF RECEIVER STANDARDIZATION ACTIVITIES 62

4.1 Tuner performance 62

4.1.1 Noise figure 62

4.1.2 Channelization plan for broadcast and cable 62

4.1.3 Direct pickup 62

4.2 Transport 62

4.3 Decoder interface 62

4.4 Digital data interface 62

4.5 Conditional access interface 63

4.6 Closed captioning 63

5. RECEIVER FUNCTIONALITY 63

5.1 Video 63

5.2 Audio 63

List of Figures

Figure 5.1. ITU-R digital terrestrial television broadcasting model. 18

Figure 5.2. High level view of encoding equipment. 19

Annex A

None

Annex B

Figure 1. Audio subsystem in the digital television system. 28

Annex C

Figure 1. Sample organization of functionality in a transmitter-receiver pair for a single program. 35

Figure 2. Ancillary service target decoder. 42

Annex D

Figure 1. VSB transmitter. 47

Figure 2. VSB data frame. 47

Figure 3. VSB channel occupancy (nominal). 48

Figure 4. Randomizer polynomial. 49

Figure 5. Reed-Solomon (207,187) t=10 parity generator polynomial. 50

Figure 6. Convolutional interleaver (byte shift register illustration). 50

Figure 7. 8 VSB trellis encoder, precoder, and symbol mapper. 51

Figure 8. Trellis code interleaver. 52

Figure 9. 8 VSB data segment. 54

Figure 10. VSB data field sync. 55

Figure 11. Field sync PN sequence generators. 56

Figure 12. Nominal VSB system channel response (linear phase raised cosine Nyquist filter). 58

Figure 13. 16 VSB data segment. 58

Figure 14. 16 VSB transmitter. 59

Figure 15. 16 VSB mapper. 59

Annex E

None

List of Tables

Table 3.1 Next Start Code 13

Annex A

Table 1 Standardized Video Input Formats 22

Table 2 Sequence Header Constraints 22

Table 3 Compression Format Constraints 23

Table 4 Sequence Extension Constraints 23

Table 5 Sequence Display Extension Constraints 24

Table 6 Picture Extension and User Data Syntax 24

Table 7 Picture User Data Syntax 25

Annex B

Table 1 Audio Constraints 28

Table 2 Audio Service Types 30

Annex C

Table 1 PID Assignment for the Constituent Elementary Streams of a Program 37

Table 2 Transport Scrambling Control Field 43

Annex D

Table 1 Interleaving Sequence 52

Table 2 Byte to Symbol Conversion, Multiplexing of Trellis Encoders 53

Annex E

None

Blank Page

ATSC Digital Television Standard

Foreword

This Standard was prepared by the Advanced Television Systems Committee (ATSC) Technology Group on Distribution (T3). The document was approved by the members of T3 on February 23, 1995 for submission by letter ballot to the membership of the full ATSC as an ATSC Standard. The document was approved by the Members of the ATSC on April 12, 1995. Changes to Annex A, to include standard definition video formats, were approved by the members of T3 on August 4, 1995 and by the Members of the ATSC on September 15, 1995.

This Standard consists of a cover document which provides background information and an overview of the digital television system defined by the Standard. The system consists of various subsystems that are described in the annexes.

I. Scope & documentation structure

A. Scope

The Digital Television Standard describes the system characteristics of the U. S. advanced television (ATV) system. The document and its normative annexes provide detailed specification of the parameters of the system including the video encoder input scanning formats and the pre-processing and compression parameters of the video encoder, the audio encoder input signal format and the pre-processing and compression parameters of the audio encoder, the service multiplex and transport layer characteristics and normative specifications, and the VSB RF/Transmission subsystem.

A. Documentation structure

The documentation of the Digital Television Standard consists of this document which provides a general system overview, a list of reference documents, and sections relating to the system as a whole. The system is modular in concept and the specifications for each of the modules are provided in the appropriate annex.

I. References

Normative references may be found in each normative Annex. The Digital Television Standard is based on the ISO/IEC MPEG-2 Video Standard, the Digital Audio Compression (AC-3) Standard, and the ISO/IEC MPEG-2 Systems Standard. Those references are listed here for the convenience of the reader. In addition, a guide to the use of the Digital Television Standard is listed.

ATSC Standard A/52 (1995), Digital Audio Compression (AC-3).

ATSC Document A/54 (1995), Guide to the Use of the ATSC Digital Television Standard.

ISO/IEC IS 13818-1, International Standard (1994), MPEG-2 Systems.

ISO/IEC IS 13818-2, International Standard (1994), MPEG-2 Video.

I. Definitions

A. Definitions

With respect to definition of terms, abbreviations and units, the practice of the Institute of Electrical and Electronics Engineers (IEEE) as outlined in the Institute's published standards shall be used. Where an abbreviation is not covered by IEEE practice, or industry practice differs from IEEE practice, then the abbreviation in question will be described in Section 3.4 of this document. Many of the definitions included therein are derived from definitions adopted by MPEG.

A. Compliance notation

As used in this document, "shall" or "will" denotes a mandatory provision of the standard. "Should" denotes a provision that is recommended but not mandatory. "May" denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor.

A. Treatment of syntactic elements

This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng).

A. Terms employed

For the purposes of the Digital Television Standard, the following definition of terms apply:

ACATS: Advisory Committee on Advanced Television Service.

access unit: A coded representation of a presentation unit. In the case of audio, an access unit is the coded representation of an audio frame. In the case of video, an access unit includes all the coded data for a picture, and any stuffing that follows it, up to but not including the start of the next access unit. If a picture is not preceded by a group_start_code or a sequence_header_code, the access unit begins with a the picture start code. If a picture is preceded by a group_start_code and/or a sequence_header_code, the access unit begins with the first byte of the first of these start codes. If it is the last picture preceding a sequence_end_code in the bit stream all bytes between the last byte of the coded picture and the sequence_end_code (including the sequence_end_code) belong to the access unit.

A/D: Analog to digital converter.

AES: Audio Engineering Society.

anchor frame: A video frame that is used for prediction. I-frames and P-frames are generally used as anchor frames, but B-frames are never anchor frames.

ANSI: American National Standards Institute.

Asynchronous Transfer Mode (ATM): A digital signal protocol for efficient transport of both constant-rate and bursty information in broadband digital networks. The ATM digital stream consists of fixed-length packets called "cells," each containing 53 8-bit bytes–a 5-byte header and a 48-byte information payload.

ATEL: Advanced Television Evaluation Laboratory.

ATM: See asynchronous transfer mode.

ATTC: Advanced Television Test Center.

ATV: The U. S. advanced television system.

bidirectional pictures or B-pictures or B-frames: Pictures that use both future and past pictures as a reference. This technique is termed bidirectional prediction. B-pictures provide the most compression. B-pictures do not propagate coding errors as they are never used as a reference.

bit rate: The rate at which the compressed bit stream is delivered from the channel to the input of a decoder.

block: A block is an 8-by-8 array of pel values or DCT coefficients representing luminance or chrominance information.

bps: Bits per second.

byte-aligned: A bit in a coded bit stream is byte-aligned if its position is a multiple of 8-bits from the first bit in the stream.

CDTV: See conventional definition television.

channel: A digital medium that stores or transports a digital television stream.

coded representation: A data element as represented in its encoded form.

compression: Reduction in the number of bits used to represent an item of data.

constant bit rate: Operation where the bit rate is constant from start to finish of the compressed bit stream.

conventional definition television (CDTV): This term is used to signify the analog NTSC television system as defined in ITU-R Recommendation 470. See also standard definition television and ITU-R Recommendation 1125.

CRC: The cyclic redundancy check to verify the correctness of the data.

D-frame: Frame coded according to an MPEG-1 mode which uses DC coefficients only.

data element: An item of data as represented before encoding and after decoding.

DCT: See discrete cosine transform.

decoded stream: The decoded reconstruction of a compressed bit stream.

decoder: An embodiment of a decoding process.

decoding (process): The process defined in the Digital Television Standard that reads an input coded bit stream and outputs decoded pictures or audio samples.

decoding time-stamp (DTS): A field that may be present in a PES packet header that indicates the time that an access unit is decoded in the system target decoder.

digital storage media (DSM): A digital storage or transmission device or system.

discrete cosine transform: A mathematical transform that can be perfectly undone and which is useful in image compression.

DSM-CC: Digital storage media command and control.

DSM: Digital storage media.

DTS: See decoding time-stamp.

DVCR: Digital video cassette recorder

ECM: See entitlement control message.

editing: A process by which one or more compressed bit streams are manipulated to produce a new compressed bit stream. Conforming edited bit streams are understood to meet the requirements defined in the Digital Television Standard.

elementary stream (ES): A generic term for one of the coded video, coded audio or other coded bit streams. One elementary stream is carried in a sequence of PES packets with one and only one stream_id.

elementary stream clock reference (ESCR): A time stamp in the PES Stream from which decoders of PES streams may derive timing.

EMM: See entitlement management message.

encoder: An embodiment of an encoding process.

encoding (process): A process that reads a stream of input pictures or audio samples and produces a valid coded bit stream as defined in the Digital Television Standard.

entitlement control message (ECM): Entitlement control messages are private conditional access information which specify control words and possibly other stream-specific, scrambling, and/or control parameters.

entitlement management message (EMM): Entitlement management messages are private conditional access information which specify the authorization level or the services of specific decoders. They may be addressed to single decoders or groups of decoders.

entropy coding: Variable length lossless coding of the digital representation of a signal to reduce redundancy.

entry point: Refers to a point in a coded bit stream after which a decoder can become properly initialized and commence syntactically correct decoding. The first transmitted picture after an entry point is either an I-picture or a P-picture. If the first transmitted picture is not an I-picture, the decoder may produce one or more pictures during acquisition.

ES: See elementary stream.

ESCR: See elementary stream clock reference.

event: An event is defined as a collection of elementary streams with a common time base, an associated start time, and an associated end time.

field: For an interlaced video signal, a "field" is the assembly of alternate lines of a frame. Therefore, an interlaced frame is composed of two fields, a top field and a bottom field.

forbidden: This term, when used in clauses defining the coded bit stream, indicates that the value shall never be used. This is usually to avoid emulation of start codes.

FPLL: Frequency and phase locked loop.

frame: A frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. For interlaced video a frame consists of two fields, a top field and a bottom field. One of these fields will commence one field later than the other.

GOP: See group of pictures.

Group of pictures (GOP): A group of pictures consists of one or more pictures in sequence.

HDTV: See high definition television.

high definition television (HDTV): High definition television has a resolution of approximately twice that of conventional television in both the horizontal (H) and vertical (V) dimensions and a picture aspect ratio (HxV) of 16:9. ITU-R Recommendation 1125 further defines "HDTV quality" as the delivery of a television picture which is subjectively identical with the interlaced HDTV studio standard.

high level: A range of allowed picture parameters defined by the MPEG-2 video coding specification which corresponds to high definition television.

Huffman coding: A type of source coding that uses codes of different lengths to represent symbols which have unequal likelihood of occurrence.

IEC: International Electrotechnical Commission.

intra-coded pictures or I-pictures or I-frames: Pictures that are coded using information present only in the picture itself and not depending on information from other pictures. Ipictures provide a mechanism for random access into the compressed video data. Ipictures employ transform coding of the pel blocks and provide only moderate compression.

ISO: International Organization for Standardization.

ITU: International Telecommunication Union.

JEC: Joint Engineering Committee of EIA and NCTA.

layer: One of the levels in the data hierarchy of the video and system specification.

level: A range of allowed picture parameters and combinations of picture parameters.

macroblock: In the advanced television system a macroblock consists of four blocks of luminance and one each Cr and Cb block.

main level: A range of allowed picture parameters defined by the MPEG-2 video coding specification with maximum resolution equivalent to ITU-R Recommendation 601.

main profile: A subset of the syntax of the MPEG-2 video coding specification that is expected to be supported over a large range of applications.

Mbps: 1,000,000 bits per second.

motion vector: A pair of numbers which represent the vertical and horizontal displacement of a region of a reference picture for prediction.

MP@HL: Main profile at high level.

MP@ML: Main profile at main level.

MPEG: Refers to standards developed by the ISO/IEC JTC1/SC29 WG11, Moving Picture Experts Group. MPEG may also refer to the Group.

MPEG-1: Refers to ISO/IEC standards 11172-1 (Systems), 11172-2 (Video), 11172-3 (Audio), 11172-4 (Compliance Testing), and 11172-5 (Technical Report).

MPEG-2: Refers to ISO/IEC standards 13818-1 (Systems), 13818-2 (Video), 13818-3 (Audio), 13818-4 (Compliance).

pack: A pack consists of a pack header followed by zero or more packets. It is a layer in the system coding syntax.

packet data: Contiguous bytes of data from an elementary data stream present in the packet.

packet identifier (PID): A unique integer value used to associate elementary streams of a program in a single or multi-program transport stream.

packet: A packet consists of a header followed by a number of contiguous bytes from an elementary data stream. It is a layer in the system coding syntax.

padding: A method to adjust the average length of an audio frame in time to the duration of the corresponding PCM samples, by continuously adding a slot to the audio frame.

payload: Payload refers to the bytes which follow the header byte in a packet. For example, the payload of a transport stream packet includes the PES_packet_header and its PES_packet_data_bytes or pointer_field and PSI sections, or private data. A PES_packet_payload, however, consists only of PES_packet_data_bytes. The transport stream packet header and adaptation fields are not payload.

PCR: See program clock reference.

pel: See pixel.

PES packet header: The leading fields in a PES packet up to but not including the PES_packet_data_byte fields where the stream is not a padding stream. In the case of a padding stream, the PES packet header is defined as the leading fields in a PES packet up to but not including the padding_byte fields.

PES packet: The data structure used to carry elementary stream data. It consists of a packet header followed by PES packet payload.

PES Stream: A PES stream consists of PES packets, all of whose payloads consist of data from a single elementary stream, and all of which have the same stream_id.

PES: An abbreviation for packetized elementary stream.

picture: Source, coded or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices representing the luminance and two chrominance signals.

PID: See packet identifier.

pixel: "Picture element" or "pel." A pixel is a digital sample of the color intensity values of a picture at a single point.

predicted pictures or P-pictures or P-frames: Pictures that are coded with respect to the nearest previous I or P-picture. This technique is termed forward prediction. Ppictures provide more compression than I-pictures and serve as a reference for future Ppictures or B-pictures. P-pictures can propagate coding errors when P-pictures (or Bpictures) are predicted from prior P-pictures where the prediction is flawed.

presentation time-stamp (PTS): A field that may be present in a PES packet header that indicates the time that a presentation unit is presented in the system target decoder.

presentation unit (PU): A decoded audio access unit or a decoded picture.

profile: A defined subset of the syntax specified in the MPEG-2 video coding specification

program clock reference (PCR): A time stamp in the transport stream from which decoder timing is derived.

program element: A generic term for one of the elementary streams or other data streams that may be included in the program.

program specific information (PSI): PSI consists of normative data which is necessary for the demultiplexing of transport streams and the successful regeneration of programs.

program: A program is a collection of program elements. Program elements may be elementary streams. Program elements need not have any defined time base; those that do have a common time base and are intended for synchronized presentation.

PSI: See program specific information.

PTS: See presentation time-stamp.

PU: See presentation unit.

quantizer: A processing step which intentionally reduces the precision of DCT coefficients

random access: The process of beginning to read and decode the coded bit stream at an arbitrary point.

reserved: This term, when used in clauses defining the coded bit stream, indicates that the value may be used in the future for Digital Television Standard extensions. Unless otherwise specified within this Standard, all reserved bits shall be set to "1".

SCR: See system clock reference.

scrambling: The alteration of the characteristics of a video, audio or coded data stream in order to prevent unauthorized reception of the information in a clear form. This alteration is a specified process under the control of a conditional access system.

SDTV: See standard definition television.

slice: A series of consecutive macroblocks.

SMPTE: Society of Motion Picture and Television Engineers.

source stream: A single, non-multiplexed stream of samples before compression coding.

splicing: The concatenation performed on the system level or two different elementary streams. It is understood that the resulting stream must conform totally to the Digital Television Standard.

standard definition television (SDTV): This term is used to signify a digital television system in which the quality is approximately equivalent to that of NTSC. This equivalent quality may be achieved from pictures sourced at the 4:2:2 level of ITU-R Recommendation 601 and subjected to processing as part of the bit rate compression. The results should be such that when judged across a representative sample of program material, subjective equivalence with NTSC is achieved. Also called standard digital television. See also conventional definition television and ITU-R Recommendation 1125.

start codes: 32-bit codes embedded in the coded bit stream that are unique. They are used for several purposes including identifying some of the layers in the coding syntax. Start codes consist of a 24 bit prefix (0x000001) and an 8 bit stream_id.

STD input buffer: A first-in, first-out buffer at the input of a system target decoder for storage of compressed data from elementary streams before decoding.

STD: See system target decoder.

still picture: A coded still picture consists of a video sequence containing exactly one coded picture which is intra-coded. This picture has an associated PTS and the presentation time of succeeding pic
tures, if any, is later than that of the still picture by at least two picture periods.

system clock reference (SCR): A time stamp in the program stream from which decoder timing is derived.

system header: The system header is a data structure that carries information summarizing the system characteristics of the Digital Television Standard multiplexed bit stream.

system target decoder (STD): A hypothetical reference model of a decoding process used to describe the semantics of the Digital Television Standard multiplexed bit stream.

time-stamp: A term that indicates the time of a specific action such as the arrival of a byte or the presentation of a presentation unit.

TOV: Threshold of visibility.

Transport Stream packet header: The leading fields in a Transport Stream packet up to and including the continuity_counter field.

variable bit rate: Operation where the bit rate varies with time during the decoding of a compressed bit stream.

VBV: See video buffering verifier.

Video buffering verifier (VBV): A hypothetical decoder that is conceptually connected to the output of an encoder. Its purpose is to provide a constraint on the variability of the data rate that an encoder can produce.

video sequence: A video sequence is represented by a sequence header, one or more groups of pictures, and an end_of_sequence code in the data stream.

8 VSB: Vestigial sideband modulation with 8 discrete amplitude levels.

16 VSB: Vestigial sideband modulation with 16 discrete amplitude levels.

A. Symbols, abbreviations, and mathematical operators

1. Introduction

The symbols, abbreviations, and mathematical operators used to describe the Digital Television Standard are those adopted for use in describing MPEG-2 and are similar to those used in the "C" programming language. However, integer division with truncation and rounding are specifically defined. The bitwise operators are defined assuming two's-complement representation of integers. Numbering and counting loops generally begin from 0.

1. Arithmetic operators

+ Addition.

- Subtraction (as a binary operator) or negation (as a unary operator).

++ Increment.

- - Decrement.

* or ¥ Multiplication.

^ Power.

/ Integer division with truncation of the result toward 0. For example, 7/4 and -7/-4 are truncated to 1 and -7/4 and 7/-4 are truncated to -1.

// Integer division with rounding to the nearest integer. Half-integer values are rounded away from 0 unless otherwise specified. For example 3//2 is rounded to 2, and -3//2 is rounded to -2.

DIV Integer division with truncation of the result towards -·.

% Modulus operator. Defined only for positive numbers.

Sign( ) Sign(x) = 1 x > 0

0 x == 0

-1 x < 0

NINT ( ) Nearest integer operator. Returns the nearest integer value to the real-valued argument. Half-integer values are rounded away from 0.

sin Sine.

cos Cosine.

exp Exponential.

÷ Square root.

log10 Logarithm to base ten.

loge Logarithm to base e.

1. Logical operators

|| Logical OR.

&& Logical AND.

! Logical NOT.

1. Relational operators

> Greater than.

Greater than or equal to.

< Less than.

£ Less than or equal to.

== Equal to.

!= Not equal to.

max [,...,] The maximum value in the argument list.

min [,...,] The minimum value in the argument list.

1. Bitwise operators

& AND.

| OR.

>> Shift right with sign extension.

<< Shift left with 0 fill.

1. Assignment

= Assignment operator.

1. Mnemonics

The following mnemonics are defined to describe the different data types used in the coded bit stream.

The byte order of multi-byte words is most significant byte first.

1. Constants

p 3.14159265359...

e 2.71828182845...

1. Method of describing bit stream syntax

Each data item in the coded bit stream described below is in bold type. It is described by its name, its length in bits, and a mnemonic for its type and order of transmission.

The action caused by a decoded data element in a bit stream depends on the value of that data element and on data elements previously decoded. The decoding of the data elements and definition of the state variables used in their decoding are described in the clauses containing the semantic description of the syntax. The following constructs are used to express the conditions when data elements are present, and are in normal type.

Note this syntax uses the "C" code convention that a variable or expression evaluating to a non-zero value is equivalent to a condition that is true.

As noted, the group of data elements may contain nested conditional constructs. For compactness, the {} are omitted when only one data element follows.

Decoders must include a means to look for start codes and sync bytes (transport stream) in order to begin decoding correctly, and to identify errors, erasures or insertions while decoding. The methods to identify these situations, and the actions to be taken, are not standardized.

a) Definition of bytealigned function

The function bytealigned( ) returns 1 if the current position is on a byte boundary; that is, the next bit in the bit stream is the first bit in a byte. Otherwise it returns 0.

a) Definition of nextbits function

The function nextbits( ) permits comparison of a bit string with the next bits to be decoded in the bit stream.

a) Definition of next_start_code function

The next_start_code( ) function removes any zero bit and zero byte stuffing and locates the next start code.

This function checks whether the current position is byte-aligned. If it is not, 0 stuffing bits are present. After that any number of 0 bytes may be present before the start-code. Therefore start-codes are always byte-aligned and may be preceded by any number of 0 stuffing bits.

Table 3.1 Next Start Code

I. Background

A. Advanced Television Systems Committee (ATSC)

The Advanced Television Systems Committee, chaired by James C. McKinney, was formed by the member organizations of the Joint Committee on InterSociety Coordination (JCIC)1 for the purpose of exploring the need for and, where appropriate, to coordinate development of the documentation of Advanced Television Systems. Documentation is understood to include voluntary technical standards, recommended practices, and engineering guidelines.

Proposed documentation may be developed by the ATSC, by member organizations of the JCIC, or by existing standards committees. The ATSC was established recognizing that the prompt, efficient and effective development of a coordinated set of national standards is essential to the future development of domestic television services.

On June 5, 1992 ATSC provided information to the Federal Communications Commission (FCC) outlining proposed industry actions to fully document the advanced television system standard. The FCC has recognized the importance of prompt disclosure of the system technical specifications to the mass production of advanced television system professional and consumer equipment in a timely fashion. The FCC has further noted its appreciation of the diligence with which the ATSC and the other groups participating in the standardization are pursuing these matters.2

Supporting this activity, the ATSC Executive Committee requested that the T3/S1 Specialist Group on Macro Systems Approach meet and suggest which portions of an advanced television system broadcasting standard might require action by the FCC and which portions should be voluntary.

Subsequently, T3/S1 held meetings and developed recommendations in two areas:

1. Principles upon which documentation of the advanced television system should be based; and

2. A list of characteristics of an advanced television system that should be documented.

The list tentatively identified the industry group(s) that would provide the documentation information and the document where the information would likely appear.

The recommendations developed by the T3/S1 Specialist Group were modified by T3 to accommodate information and knowledge about advanced television systems developed in the period since June 1992. Some of the modifications to the recommendations ensued from the formation of the Grand Alliance. The modified guidelines were approved at the March 31, 1994 meeting of the T3 Technology Group on Distribution and are described in Section 4.5.

A. Advisory Committee on Advanced Television Service (ACATS)

A "Petition for Notice of Inquiry" was filed with the FCC on February 21, 1987 by 58 broadcasting organizations and companies requesting that the Commission initiate a proceeding to explore the issues arising from the introduction of advanced television technologies and their possible impact on the television broadcasting service. At that time, it was generally believed that High Definition Television (HDTV) could not be broadcast using 6 MHz terrestrial broadcasting channels. The broadcasting organizations were concerned that the alternative media would be able to deliver HDTV to the viewing public placing terrestrial broadcasting at a severe disadvantage.

The FCC agreed that this was a subject of utmost importance and initiated a proceeding (MMDocket No. 87-268) to consider the technical and public policy issues of advanced television systems. The Advisory Committee on Advanced Television Service was empaneled by the Federal Communications Commission in 1987 with Richard E. Wiley as chairman to develop information that would assist the FCC in establishing an advanced television standard for the United States. The objective given to the Advisory Committee in its Charter by the FCC was:

"The Committee will advise the Federal Communications Commission on the facts and circumstances regarding advanced television systems for Commission consideration of technical and public policy issues. In the event that the Commission decides that adoption of some form of advanced broadcast television is in the public interest, the Committee would also recommend policies, standards and regulations that would facilitate the orderly and timely introduction of advanced television services in the United States."

The Advisory Committee established a series of subgroups to study the various issues concerning services, technical parameters, and testing mechanisms required to establish an Advanced television system standard. The Advisory Committee also established a system evaluation, test and analysis process that began with over twenty proposed systems, reducing them to four final systems for consideration.

A. Digital HDTV Grand Alliance (Grand Alliance)

On May 24, 1993 the three groups that had developed the four final digital systems agreed to produce a single, best-of-the best system to propose as the standard. The three groups (AT&T and Zenith Electronics Corporation; General Instrument Corporation and the Massachusetts Institute of Technology; and Philips Consumer Electronics, Thomson Consumer Electronics, and the David Sarnoff Research Center) have been working together as the "Digital HDTV Grand Alliance." The system described in this Standard is based on the Digital HDTV Grand Alliance proposal to the Advisory Committee.

A. Organization for documenting the Digital Television Standard

The ATSC Executive Committee assigned the work of documenting the advanced television system standards to T3 specialist groups dividing the work into five areas of interest: Video (including input signal format and source coding), Audio (including input signal format and source coding), Transport (including data multiplex and channel coding), RF/Transmission, (including the modulation subsystem) and Receiver characteristics. A steering committee consisting of the chairs of the five specialist groups, the chair and vice-chairs of T3, and liaison among the ATSC, the FCC, and ACATS was established to coordinate the development of the documents. The members of the steering committee and areas of interest were as follows:

Stanley Baron T3 chair

Jules Cohen T3 vice-chair

Brian James T3 vice-chair

Larry Pearlstein T3/S6 (Video systems characteristics), chair

Graham S. Stubbs T3/S7 (Audio systems characteristics), chair

Bernard J. Lechner T3/S8 (Service multiplex and transport systems characteristics), chair

Lynn D. Claudy T3/S9 (RF/Transmission systems characteristics), chair

Werner F. Wedam T3/S10 (Receiver characteristics), chair

Robert M. Rast Grand Alliance facilitator

Robert Hopkins ATSC

Robert M. Bromery FCC Office of Engineering and Technology

Gordon Godfrey FCC Mass Media Bureau

Paul E. Misener ACATS

A. Principles for documenting the Digital Television Standard

T3 adopted the following principles for documenting the advanced television system standard:

1. The Grand Alliance was recognized as the principal supplier of information for documenting the advanced television system, supported by the ATSC and others. Other organizations seen as suppliers of information: EIA, FCC, IEEE, MPEG, NCTA, and SMPTE.

2. The Grand Alliance was encouraged to begin drafting the essential elements of system details as soon as possible to avoid delays in producing the advanced television system documentation.

3. FCC requirements for the advanced television system standard were to be obtained as soon as possible.

4. Complete functional system details (permitting those skilled in the art to construct a working system) were to be made publicly available.

5. Protection of any intellectual property made public must be by patent or copyright as appropriate.

6. The advanced television system documentation shall include the necessary system information such that audio and video encoders may be manufactured to deliver the system's full demonstrated performance quality.

7. The advanced television system documentation shall point to existing standards, recommended practices or guideline documents. These documents shall be referenced in one of two ways as deemed appropriate for the application. In the first instance, a specific revision shall be specified where review of changes to the referenced document is required before changes might be incorporated into the advanced television system document. The second instance references the document without specificity to revision and allows any changes to the referenced documents to be automatically incorporated.

8. System specifications shall explain how future, compatible improvements may be achieved.

9. As ongoing improvements take place in the advanced television system, manufacturers of encoders and decoders should coordinate their efforts to insure compatibility.

10. The advanced television system standard must support backward compatibility of future improvements with all generations of advanced television system receivers and inherently support production of low cost receivers (not withstanding that cost reduction through reduced performance quality may also be used to achieve inexpensive products).

11. The advanced television system standard should not foreclose flexibility in implementing advanced television system receivers at different price and performance levels.

12. The advanced television system standard should not foreclose flexibility in implementing program services or in data stream modification or insertion of data packets by down-stream (local) service providers.

13. The advanced television system documentation shall address interoperability with non-broadcast delivery systems including cable.

14. The advanced television system standard shall identify critical system parameters and shall provide information as to the range of acceptable values, the method of measurement, and the location in the system where measurement takes place.

I. System overview

A. Objectives

The Digital Television Standard describes a system designed to transmit high quality video and audio and ancillary data over a single 6 MHz channel. The system can deliver reliably about 19 Mbps of throughput in a 6 MHz terrestrial broadcasting channel and about 38 Mbps of throughput in a 6 MHz cable television channel. This means that encoding a video source whose resolution can be as high as five times that of conventional television (NTSC) resolution requires a bit rate reduction by a factor of 50 or higher. To achieve this bit rate reduction, the system is designed to be efficient in utilizing available channel capacity by exploiting complex video and audio compression technology.

The objective is to maximize the information passed through the data channel by minimizing the amount of data required to represent the video image sequence and its associated audio. The objective is to represent the video, audio, and data sources with as few bits as possible while preserving the level of quality required for the given application.

Although the RF/Transmission subsystems described in this Standard are designed specifically for terrestrial and cable applications, the objective is that the video, audio, and service multiplex/transport subsystems be useful in other applications.

A. System block diagram

A basic block diagram representation of the system is shown in Figure 5.1. This representation is based on one adopted by the International Telecommunication Union, Radiocommunication Sector (ITU-R), Task Group 11/3 (Digital Terrestrial Television Broadcasting). According to this model, the digital television system can be seen to consist of three subsystems.3

1. Source coding and compression,

2. Service multiplex and transport, and

3. RF/Transmission.

Figure 5.1. ITU-R digital terrestrial television broadcasting model.

"Source coding and compression" refers to the bit rate reduction methods, also known as data compression, appropriate for application to the video, audio, and ancillary digital data streams. The term "ancillary data" includes control data, conditional access control data, and data associated with the program audio and video services, such as closed captioning. "Ancillary data" can also refer to independent program services. The purpose of the coder is to minimize the number of bits needed to represent the audio and video information. The digital television system employs the MPEG-2 video stream syntax for the coding of video and the Digital Audio Compression (AC-3) Standard for the coding of audio.

"Service multiplex and transport" refers to the means of dividing the digital data stream into "packets" of information, the means of uniquely identifying each packet or packet type, and the appropriate methods of multiplexing video data stream packets, audio data stream packets, and ancillary data stream packets into a single data stream. In developing the transport mechanism, interoperability among digital media, such as terrestrial broadcasting, cable distribution, satellite distribution, recording media, and computer interfaces, was a prime consideration. The digital television system employs the MPEG-2 transport stream syntax for the packetization and multiplexing of video, audio, and data signals for digital broadcasting systems.4 The MPEG-2 transport stream syntax was developed for applications where channel bandwidth or recording media capacity is limited and the requirement for an efficient transport mechanism is paramount. It was designed also to facilitate interoperability with the ATM transport mechanism.

"RF/Transmission" refers to channel coding and modulation. The channel coder takes the data bit stream and adds additional information that can be used by the receiver to reconstruct the data from the received signal which, due to transmission impairments, may not accurately represent the transmit
ted signal. The modulation (or physical layer) uses the digital data stream information to modulate the transmitted signal. The modulation subsystem offers two modes: a terrestrial broadcast mode (8 VSB), and a high data rate mode (16 VSB).

Figure 5.2 illustrates a high level view of encoding equipment. This view is not intended to be complete, but is used to illustrate the relationship of various clock frequencies within the encoder. There are two domains within the encoder where a set of frequencies are related, the source coding domain and the channel coding domain.

Figure 5.2. High level view of encoding equipment.

The source coding domain, represented schematically by the video, audio and transport encoders, uses a family of frequencies which are based on a 27 MHz clock (f27MHz). This clock is used to generate a 42-bit sample of the frequency which is partitioned into two parts defined by the MPEG-2 specification. These are the 33-bit program_clock_reference_base and the 9-bit program_clock_reference_extension. The former is equivalent to a sample of a 90 kHz clock which is locked in frequency to the 27 MHz clock, and is used by the audio and video source encoders when encoding the presentation time stamp (PTS) and the decode time stamp (DTS). The audio and video sampling clocks, fa and fv respectively, must be frequency-locked to the 27 MHz clock. This can be expressed as the requirement that there exist two pairs of integers, (na, ma) and (nv, mv), such that:

and

The channel coding domain is represented by the FEC/Sync Insertion subsystem and the VSB modulator. The relevant frequencies in this domain are the VSB symbol frequency (fsym)and the frequency of the transport stream (fTP) which is the frequency of transmission of the encoded transport stream. These two frequencies must be locked, having the relation:

The signals in the two domains are not required to be frequency-locked to each other, and in many implementations will operate asynchronously. In such systems, the frequency drift can necessitate the occasional insertion or deletion of a NULL packet from within the transport stream, thereby accommodating the frequency disparity.

The annexes that follow consider the characteristics of the subsystems necessary to accommodate the services envisioned.

Annex A

(Normative)

Video Systems Characteristics

I. Scope

This Annex describes the characteristics of the video subsystem of the Digital Television Standard. The input formats and bit stream characteristics are described in separate sections.

I. References

A. Normative references

The following documents contain provisions which, through reference in this text, constitute provisions of this standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreement based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below.

ISO/IEC IS 13818-1, International Standard (1994), MPEG-2 Systems.

ISO/IEC IS 13818-2, International Standard (1994), MPEG-2 Video.

A. Informative references

SMPTE 274M (1995), Standard for television, 1920 x 1080 Scanning and Interface.

SMPTE S17.392 (1995), Proposed Standard for television, 1280 x 720 Scanning and Interface.

ITU-R BT.601-4 (1994), Encoding parameters of digital television for studios.

I. Compliance notation

As used in this document, "shall" or "will" denotes a mandatory provision of the standard. "Should" denotes a provision that is recommended but not mandatory. "May" denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor.

I. Possible video inputs

While not required by this standard, there are certain television production standards, shown in Table 1, that define video formats that relate to compression formats specified by this standard.

Table 1 Standardized Video Input Formats

The compression formats may be derived from one or more appropriate video input formats. It may be anticipated that additional video production standards will be developed in the future that extend the number of possible input formats.

I. Source coding specification

The ATV video compression algorithm shall conform to the Main Profile syntax of ISO/IEC 13818-2. The allowable parameters shall be bounded by the upper limits specified for the Main Profile at High Level.5 Additionally, ATV bit streams shall meet the constraints and specifications described in Sections 5.1 and 5.2.

A. Constraints with respect to ISO/IEC 13818-2 Main Profile

The following tables list the allowed values for each of the ISO/IEC 13818-2 syntactic elements which are restricted beyond the limits imposed by MP@HL.

In these tables conventional numbers denote decimal values, numbers preceded by 0x are to be interpreted as hexadecimal values and numbers within single quotes (e.g., '10010100') are to be interpreted as a string of binary digits.

1. Sequence header constraints

Table 2 identifies parameters in the sequence header of a bit stream that shall be constrained by the video subsystem and lists the allowed values for each.

Table 2 Sequence Header Constraints

The allowable values for the field bit_rate_value are application dependent. In the primary application of terrestrial broadcast, this field shall correspond to a bit rate which is less than or equal to 19.4 Mbps. In the high data rate mode, the corresponding bit rate is less than or equal to 38.8 Mbps.

1. Compression format constraints

Table 3 lists the allowed compression formats.

Table 3 Compression Format Constraints

1. Sequence extension constraints

Table 4 identifies parameters in the sequence extension part of a bit stream that shall be constrained by the video subsystem and lists the allowed values for each. A sequence_extension structure is required to be present after every sequence_header structure.

Table 4 Sequence Extension Constraints

Note: The profile_and_level_indication field shall indicate the lowest profile and level defined in ISO/IEC 13818-2, Section 8, that is consistent with the parameters of the video elementary stream.

1. Sequence display extension constraints

Table 5 identifies parameters in the sequence display extension part of a bit stream that shall be constrained by the video subsystem and lists the allowed values for each.

Table 5 Sequence Display Extension Constraints

The preferred and default values for color_primaries, transfer_characteristics, and matrix_coefficients are defined to be SMPTE 274M6 (value 0x01 in all three cases). While all values described by MPEG-2 are allowed in the transmitted bit stream, it is noted that SMPTE 170M values (0x06 in all three cases) will be the most likely alternate in common use.

1. Picture header constraints

In all cases other than when vbv_delay has the value 0xFFFF, the value of vbv_delay shall be constrained as follows:

vbv_delay Û 45000

A. Bit stream specifications beyond MPEG-2

This section covers the extension and user data part of the video syntax. These data are inserted at the sequence, GOP, and picture level. The syntax used for the insertion of closed captioning in picture user data is described.7

1. Picture extension and user data syntax

Table 6 describes the syntax used for picture extension and user data.

Table 6 Picture Extension and User Data Syntax

1. Picture user data syntax

Table 7 describes the picture user data syntax.

Table 7 Picture User Data Syntax8

1. Picture user data semantics

user_data_start_code – This is set to 0x0000 01B2.

ATSC_identifier – This is a 32 bit code that indicates that the video user data conforms to this specification. The value ATSC_identifier shall be 0x4741 3934.

user_data_type_code – The 8-bit code is set to 0x03.

process_em_data_flag – This flag is set to indicate whether it is necessary to process the em_data. If it is set to 1, the em_data has to be parsed and its meaning has to be processed. When it is set to 0, the em_data can be discarded.

process_cc_data_flag – This flag is set to indicate whether it is necessary to process the cc_data. If it is set to 1, the cc_data has to be parsed and its meaning has to be processed. When it is set to 0, the cc_data can be discarded.

additional_data_flag – This flag is set to 1 to indicate the presence of additional user data.

cc_count – This 5-bit integer indicates the number of closed caption constructs following this field. It can have values 0 through 31. The value of cc_count shall be set according to the frame rate and coded picture structure (field or frame) such that a fixed bandwidth of 9600 bits per second is maintained for the closed caption payload data. Sixteen (16) bits of closed caption payload data are carried in each pair of the fields cc_data_1 and cc_data_2.

em_data – Eight bits for representing emergency message.9

cc_valid – This flag is set to '1' to indicate that the two closed caption data bytes that follow are valid. If set to '0' the two data bytes are invalid.

cc_type – Denotes the type of the two closed caption data bytes that follow.10

cc_data_1 – The first byte of a closed caption data pair.

cc_data_2 – The second byte of a closed caption data pair.

additional_user_data – Any further demand for picture user data could be met by defining this part of the bit stream.

Annex B

(Normative)

Audio Systems Characteristics

I. Scope

This Annex describes the audio system characteristics and normative specifications of the Digital Television Standard.

I. Normative references

The following documents contain provisions which in whole or part, through reference in this text, constitute provisions of this standard. At the time of publication, the editions indicated were valid. All standards are subject to revision and amendment, and parties to agreement based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below.

ATSC Standard A/52 (1995), Digital Audio Compression (AC-3).

AES 3-1992 (ANSI S4.40-1992), AES Recommended Practice for digital audio engineering – Serial transmission format for two-channel linearly represented digital audio data.

ANSI S1.4-1983, Specification for Sound Level Meters.

IEC 651 (1979), Sound Level Meters.

IEC 804 (1985), Amendment 1 (1989) Integrating/Averaging Sound Level Meters.

I. Compliance notation

As used in this document, "shall" or "will" denotes a mandatory provision of the standard. "Should" denotes a provision that is recommended but not mandatory. "May" denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor.

I. System overview

As illustrated in Figure 1, the audio subsystem comprises the audio encoding/decoding function and resides between the audio inputs/outputs and the transport subsystem. The audio encoder(s) is (are) responsible for generating the audio elementary stream(s) which are encoded representations of the baseband audio input signals. At the receiver, the audio subsystem is responsible for decoding the audio elementary stream(s) back into baseband audio.

Figure 1. Audio subsystem in the digital television system.

I. Specification

This Section forms the normative specification of the audio system. The audio compression system conforms with the Digital Audio Compression (AC-3) Standard, subject to the constraints outlined in this Section.

A. Constraints with respect to ATSC Standard A/52

The digital television audio coding system is based on the Digital Audio Compression (AC-3) Standard specified in the body of ATSC Doc. A/52 (the annexes are not included). Constraints on the system are shown in Table 1 which shows permitted values of certain syntactical elements. These constraints are described in Sections 5.2 - 5.4.

Table 1 Audio Constraints

A. Sampling frequency

The system conveys digital audio sampled at a frequency of 48 kHz, locked to the 27 MHz system clock. The 48 kHz audio sampling clock is defined as:

(1) 48 kHz audio sample rate = ( 2 1125 ) ¥ ( 27 MHz system clock )

If analog signal inputs are employed, the A/D converters should sample at 48 kHz. If digital inputs are employed, the input sampling rate shall be 48 kHz, or the audio encoder shall contain sampling rate converters which convert the sampling rate to 48 kHz.

A. Bit rate

A main audio service, or an associated audio service which is a complete service (containing all necessary program elements) shall be encoded at a bit rate less than or equal to 384 kbps. A single channel associated service containing a single program element shall be encoded at a bit rate less than or equal to 128 kbps. A two channel associated service containing only dialogue shall be encoded at a bit rate less than or equal to 192 kbps. The combined bit rate of a main service and an associated service which are intended to be decoded simultaneously shall be less than or equal to 512 kbps.

A. Audio coding modes

Audio services shall be encoded using any of the audio coding modes specified in A/52, with the exception of the 1+1 mode. The value of acmod in the AC-3 bit stream shall have a value in the range of 1-7, with the value 0 prohibited.

A. Dialogue level

The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the level of average spoken dialogue within the encoded audio program. Dialogue level may be measured by means of an "A" weighted integrated measurement (LAeq). (Receivers use the value of dialnorm to adjust the reproduced audio level so as to normalize the dialogue level.)

A. Dynamic range compression

Each encoded audio block may contain a dynamic range control word (dynrng) which is used by decoders (by default) to alter the level of the reproduced audio. The control words allow the decoded
signal level to be increased or decreased by up to 24 dB. In general, elementary streams may have dynamic range control words inserted or modified without affecting the encoded audio. When it is necessary to alter the dynamic range of audio programs which are broadcast, the dynamic range control word should be used.

I. Main and associated services

A. Overview

An AC-3 elementary stream contains the encoded representation of a single audio service. Multiple audio services are provided by multiple elementary streams. Each elementary stream is conveyed by the transport multiplex with a unique PID. There are a number of audio service types which may (individually) be coded into each elementary stream. Each AC-3 elementary stream is tagged as to its service type using the bsmod bit field. There are two types of main service and six types of associated service. Each associated service may be tagged (in the AC-3 audio descriptor in the transport PSI data) as being associated with one or more main audio services. Each AC-3 elementary stream may also be tagged with a language code.

Associated services may contain complete program mixes, or may contain only a single program element. Associated services which are complete mixes may be decoded and used as is. They are identified by the full_svc bit in the AC-3 descriptor (see A/52, Annex A). Associated services which contain only a single program element are intended to be combined with the program elements from a main audio service.

This Section specifies the meaning and use of each type of service. In general, a complete audio program (what is presented to the listener over the set of loudspeakers) may consist of a main audio service, an associated audio service which is a complete mix, or a main audio service combined with an associated audio service. The capability to simultaneously decode one main service and one associated service is required in order to form a complete audio program in certain service combinations described in this Section. This capability may not exist in some receivers.

A. Summary of service types

The audio service types are listed in Table 2.

Table 2 Audio Service Types

A. Complete main audio service (CM)

The CM type of main audio service contains a complete audio program (complete with dialogue, music, and effects). This is the type of audio service normally provided. The CM service may contain from 1 to 5.1 audio channels. The CM service may be further enhanced by means of the VI, HI, C, E, or VO associated services described below. Audio in multiple languages may be provided by supplying multiple CM services, each in a different language.

A. Main audio service, music and effects (ME)

The ME type of main audio service contains the music and effects of an audio program, but not the dialogue for the program. The ME service may contain from 1 to 5.1 audio channels. The primary
program dialogue is missing and (if any exists) is supplied by simultaneously encoding a D associated service. Multiple D associated services in different languages may be associated with a single ME service.

A. Visually impaired (VI)

The VI associated service typically contains a narrative description of the visual program content. In this case, the VI service shall be a single audio channel. The simultaneous reproduction of both the VI associated service and the CM main audio service allows the visually impaired user to enjoy the main multi-channel audio program, as well as to follow (by ear) the on-screen activity.

The dynamic range control signal in this type of VI service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the VI service provider, and the provider may signal the decoder (by altering the dynamic range control words embedded in the VI audio elementary stream) to reduce the level of the main audio service by up to 24 dB in order to assure that the narrative description is intelligible.

Besides providing the VI service as a single narrative channel, the VI service may be provided as a complete program mix containing music, effects, dialogue, and the narration. In this case, the service may be coded using any number of channels (up to 5.1), and the dynamic range control signal applies only to this service. The fact that the service is a complete mix shall be indicated in the AC-3 descriptor (see A/52, Annex A).

A. Hearing impaired (HI)

The HI associated service typically contains only dialogue which is intended to be reproduced simultaneously with the CM service. In this case, the HI service shall be a single audio channel. This dialogue may have been processed for improved intelligibility by hearing impaired listeners. Simultaneous reproduction of both the CM and HI services allows the hearing impaired listener to hear a mix of the CM and HI services in order to emphasize the dialogue while still providing some music and effects.

Besides providing the HI service as a single dialogue channel, the HI service may be provided as a complete program mix containing music, effects, and dialogue with enhanced intelligibility. In this case, the service may be coded using any number of channels (up to 5.1). The fact that the service is a complete mix shall be indicated in the AC-3 descriptor (see A/52, Annex A).

A. Dialogue (D)

The D associated service contains program dialogue intended for use with an ME main audio service. The language of the D service is indicated in the AC-3 bit stream, and in the audio descriptor. A complete audio program is formed by simultaneously decoding the D service and the ME service and mixing the D service into the center channel of the ME main service (with which it is associated).

If the ME main audio service contains more than two audio channels, the D service shall be monophonic (1/0 mode). If the main audio service contains two channels, the D service may also contain two channels (2/0 mode). In this case, a complete audio program is formed by simultaneously decoding the D service and the ME service, mixing the left channel of the ME service with the left channel of the D service, and mixing the right channel of the ME service with the right channel of the D service. The result will be a two channel stereo signal containing music, effects, and dialogue.

Audio in multiple languages may be provided by supplying multiple D services (each in a different language) along with a single ME service. This is more efficient than providing multiple CM services, but, in the case of more than two audio channels in the ME service, requires that dialogue be restricted to the center channel.

Some receivers may not have the capability to simultaneously decode an ME and a D service.

A. Commentary (C)

The commentary associated service is similar to the D service, except that instead of conveying essential program dialogue, the C service conveys optional program commentary. The C service may be a single audio channel containing only the commentary content. In this case, simultaneous reproduction of a C service and a CM service will allow the listener to hear the added program commentary.

The dynamic range control signal in the single channel C service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the C service provider, and the provider may signal the decoder (by altering the dynamic range control words embedded in the C audio elementary stream) to reduce the level of the main audio service by up to 24 dB in order to assure that the commentary is intelligible.

Besides providing the C service as a single commentary channel, the C service may be provided as a complete program mix containing music, effects, dialogue, and the commentary. In this case the service may be provided using any number of channels (up to 5.1). The fact that the service is a complete mix shall be indicated in the AC-3 descriptor (see A/52, Annex A).

A. Emergency (E)

The E associated service is intended to allow the insertion of emergency or high priority announcements. The E service is always a single audio channel. An E service is given priority in transport and in audio decoding. Whenever the E service is present, it will be delivered to the audio decoder. Whenever the audio decoder receives an E type associated service, it will stop reproducing any main service being received and only reproduce the E service out of the center channel (or left and right channels if a center loudspeaker does not exits). The E service may also be used for non-emergency applications. It may be used whenever the broadcaster wishes to force all decoders to quit reproducing the main audio program and reproduce a higher priority single audio channel.

A. Voice-over (V0)

The VO associated service is a single channel service intended to be reproduced along with the main audio service in the receiver. It allows typical voice-overs to be added to an already encoded audio elementary stream without requiring the audio to be decoded back to baseband and then re-encoded. It is always a single audio channel. It has second priority (only the E service has higher priority). It is intended to be simultaneously decoded and mixed into the center channel of the main audio service. The dynamic range control signal in the VO service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service may be controlled by the broadcaster, and the broadcaster may signal the decoder (by altering the dynamic range control words embedded in the VO audio elementary stream) to reduce the level of the main audio service by up to 24 dB during the voice-over.

Some receivers may not have the capability to simultaneously decode and reproduce a voice-over service along with a program audio service.

I. Audio encoder interfaces

A. Audio encoder input characteristics

Audio signals which are input to the digital television system may be in analog or digital form. Audio signals should have any DC offset removed before being encoded. If the audio encoder does not include a DC blocking high pass filter, the audio signals should be high pass filtered before being applied to the encoder. In general, input signals should be quantized to at least 16-bit resolution. The audio compression system can convey audio signals with up to 24-bit resolution. Physical interfaces for the audio inputs to the encoder may be defined as voluntary industry standards by the AES, SMPTE, or other standards organizations.

A. Audio encoder output characteristics

Conceptually, the output of the audio encoder is an elementary stream which is formed into PES packets within the transport subsystem. It is possible that systems will be implemented wherein the formation of audio PES packets takes place within the audio encoder. In this case, the output(s) of the audio encoder(s) would be PES packets. Physical interfaces for these outputs (elementary streams and/or PES packets) may be defined as voluntary industry standards by SMPTE or other standards organizations.

Annex C

(Normative)

Service Multiplex and Transport Systems Characteristics

I. Scope

This Annex describes the transport layer characteristics and normative specifications of the Digital Television Standard.

I. Normative references

The following documents contain provisions which in whole or in part, through reference in this text, constitute provisions of this Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision and amendment, and parties to agreements based on this Standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below.

ATSC Standard A/52 (1995), Digital Audio Compression (AC-3).

ISO/IEC IS 13818-1, International Standard (1994), MPEG-2 Systems.

ISO/IEC IS 13818-2, International Standard (1994), MPEG-2 Video.

ISO/IEC CD 13818-4, MPEG Committee Draft (1994), MPEG-2 Compliance.

The normative reference for the Program Guide will be the standard developed from ATSC document T3/S8-050, "Program Guide for Digital Television".

The normative reference for System Information will be the standard developed from ATSC document T3/S8-079, "System Information for Digital Television".

I. Compliance notation

As used in this document, "shall" or "will" denotes a mandatory provision of the standard. "Should" denotes a provision that is recommended but not mandatory. "May" denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor.

I. System overview

The transport format and protocol for the Digital Television Standard is a compatible subset of the MPEG-2 Systems specification defined in ISO/IEC 13818-1. It is based on a fixed-length packet transport stream approach which has been defined and optimized for digital television delivery applications.

As illustrated in Figure 1, the transport function resides between the application (e.g., audio or video) encoding and decoding functions and the transmission subsystem. The encoder's transport subsystem is responsible for formatting the coded elementary streams and multiplexing the different components of the program for transmission. At the receiver, it is responsible for recovering the elementary streams for the individual application decoders and for the corresponding error signaling. The transport subsystem also incorporates other higher protocol layer functionality related to synchronization of the receiver.

Figure 1. Sample organization of functionality in a transmitter-receiver pair for a single program.

The overall system multiplexing approach can be thought of as a combination of multiplexing at two different layers. In the first layer, single program transport bit streams are formed by multiplexing transport packets from one or more Packetized Elementary Stream (PES) sources. In the second layer, many single program transport bit streams are combined to form a system of programs. The Program Specific Information (PSI) streams contain the information relating to the identification of programs and the components of each program.

Not shown explicitly in Figure 1, but essential to the practical implementation of this Standard, is a control system that manages the transfer and processing of the elementary streams from the application encoders. The rules followed by this control system are not a part of this Standard but must be established as recommended practices by the users of the Standard. The control system implementation shall adhere to the requirements of the MPEG-2 transport system as specified in ISO/IEC 13818-1 with the additional constraints specified in this Standard. These constraints may go beyond the constraints imposed by the application encoders.

I. Specification

This Section constitutes the normative specification for the transport system of the Digital Television Standard. The syntax and semantics of the specification conform to ISO/IEC 13818-1 subject to the constraints and conditions specified in this Standard. This Section of the Standard describes the
coding constraints that apply to the use of the MPEG-2 systems specification in the digital television system.

A. MPEG-2 Systems standard

The transport system is based on the transport stream definition of the MPEG-2 Systems standard as specified in ISO/IEC 13818-1.

1. Video T-STD

The video T-STD is specified in Section 2.4.2.3 of ISO/IEC 13818-1 and follows the constraints for the level encoded in the video elementary stream.

1. Audio T-STD

The audio T-STD is specified in Section 3.6 of Annex A of ATSC Standard A/52.

A. Registration descriptor

This Standard uses the registration descriptor described in Section 2.6.8 of ISO/IEC 13818-1 to identify the contents of programs and elementary streams to decoding equipment.

1. Program identifier

Programs which conform to this specification will be identified by the 32-bit identifier in the section of the Program Map Table (PMT) detailed in Section 2.4.4.8 of ISO/IEC 13818-1. The identifier will be coded according to Section 2.6.8, and shall have a value of 0x4741 3934.

1. Audio elementary stream identifier

Audio elementary streams which conform to this specification will be identified by the 32-bit identifier in the section of the Program Map Table (PMT) detailed in Section 2.4.4.8 of ISO/IEC 13818-1. The identifier will be coded according to Section 2.6.8, and shall have a value of 0x4143 2D33.

A. The program paradigm

The program paradigm specifies the method that shall be used for allocating the values of the Packet Identifier (PID) field of the transport packet header in a systematic manner. Within one transport multiplex, television programs that follow the program paradigm are assigned a program number ranging from 1 to 255. The binary value of the program number is used to form b11 through b4 of the PID. Programs adhering to the paradigm shall have b12 equal to '0'. Programs not adhering to the paradigm shall have b12 equal to '1'.

We further define:

base_PID = program number << 4

where program number refers to each program within one transport multiplex and corresponds to the 16-bit program_number identified in PAT and PMT.

The b0 through b3 of the PID are assigned according to Table 1.

The paradigm to identify the transport bit streams containing certain elements of the program is
defined in Table 1.

Table 1 PID Assignment for the Constituent Elementary Streams of a Program

The program_map_table must be decoded to obtain the PIDs for services not defined by the paradigm but included within the program (such as a second data channel). According to the program paradigm, every 16th PID is a PMT_PID and may be assigned to a program. If a PMT_PID is assigned to a program by the program paradigm, the next 15 PIDs after that PMT_PID are reserved for elements of that program and shall not be otherwise assigned.

A. Constraints on PSI

The program constituents for all programs, including television programs that follow the program paradigm and other programs or services that do not follow the program paradigm, are described in the PSI. There are the following constraints on the PSI information:

Only one program is described in a PSI transport bit stream corresponding to a particular PMT_PID value. A transport bit stream containing a program_map_table shall not be used to transmit any other kind of PSI table (identified by a different table_id).

The maximum spacing between occurrences of a program_map_table containing television program information shall be 400 ms.

The program numbers are associated with the corresponding PMT_PIDs in the PID0 Program Association Table. The maximum spacing between occurrences of section 0 of the program_association_table is 100 ms.

The video elementary stream section shall contain the Data stream alignment descriptor described in Section 2.6.10 of ISO/IEC 13818-1. The alignment_type field shown in Table 2-47 of ISO/IEC 13818-1 shall be 0x02.

Adaptation headers shall not occur in transport packets of the PMT_PID for purposes other than for signaling with the discontinuity_indictor that the version_number (Section 2.4.4.5 of ISO/IEC 13818-1) may be discontinuous.

Adaptation headers shall not occur in transport packets of the PAT_PID for purposes other than for signaling with the discontinuity_indicator that the version_number (Section 2.4.4.5 of ISO/IEC 13818-1) may be discontinuous.

A. PES constraints

Packetized Elementary Stream syntax and semantics shall be used to encapsulate the audio and video elementary stream information. The Packetized Elementary Stream syntax is used to convey the Presentation Time-Stamp (PTS) and Decoding Time-Stamp (DTS) information required for decoding audio and video information with synchronism. This Section describes the coding constraints for this system layer.

Within the PES packet header, the following restrictions apply:

PES_scrambling_control shall be coded as '00'.

ESCR_flag shall be coded as '0'.

ES_rate_flag shall be coded as '0'.

PES_CRC_flag shall be coded as '0'.

Within the PES packet extension, the following restrictions apply.

PES_private_data_flag shall be coded as '0'.

pack_header_field_flag shall be coded as '0'.

program_packet_sequence_counter_flag shall be coded as '0'.

P-STD_buffer_flag shall be coded as '0'.

1. Video PES constraints

Each PES packet shall begin with a video access unit, as defined in Section 2.1.1 of ISO/IEC 13818-1, which is aligned with the PES packet header. The first byte of a PES packet payload shall be the first byte of a video access unit. Each PES header shall contain a PTS. Additionally, it shall contain a DTS as appropriate. For terrestrial broadcast, the PES packet shall not contain more than one coded video frame, and shall be void of video picture data only when transmitted in conjunction with the discontinuity_indicator to signal that the continuity_counter may be discontinuous.

Within the PES packet header, the following restrictions apply:

The PES_packet_length shall be coded as '0x0000'.

data_alignment_indicator shall be coded as '1'.

1. Audio PES constraints

The audio decoder may be capable of simultaneously decoding more than one elementary stream containing different program elements, and then combining the program elements into a complete program. In this case, the audio decoder may sequentially decode audio frames (or audio blocks) from each elementary stream and do the combining (mixing together) on a frame or (block) basis. In order to have the audio from the two elementary streams reproduced in exact sample synchronism, it is necessary for the original audio elementary stream encoders to have encoded the two audio program elements frame synchronously; i.e., if audio program 1 has sample 0 of frame n at time t0, then audio program 2 should also have frame n beginning with its sample 0 at the identical time t0. If the encoding is done frame synchronously, then matching audio frames should have identical values of PTS.

If PES packets from two audio services that are to be decoded simultaneously contain identical values of PTS then the corresponding encoded audio frames contained in the PES packets should be presented to the audio decoder for simultaneous synchronous decoding. If the PTS values do not match (indicating that the audio encoding was not frame synchronous) then the audio frames which are closest in time may be presented to the audio decoder for simultaneous decoding. In this case the two services may be reproduced out of sync by as much as 1/2 of a frame time (which is often satisfactory, e.g., a voice-over does not require precise timing).

The value of stream_id for AC-3 shall be 1011 1101 (private_stream_1).

A. Services and features

1. Program guide

a) Master program guide PID

At the option of broadcasters, an interactive program guide database may be transmitted in the transport stream. If present, the master program guide data stream shall be transported in PID 0x1FFD. This PID shall be reserved exclusively for the program guide. The program guide shall be formatted according to the structure and syntax described in the standard developed from ATSC document T3/S8-050, "Program Guide for Digital Television". The program guide database allows a receiver to build an on-screen grid of program information and contains control information to facilitate navigation.

a) Program guide STD model

Each program guide bit stream shall adhere to an STD model that can be described by an MPEG smoothing buffer descriptor (Section 2.6.30 in ISO/IEC 13818-1) with the following constraints:

sb_leak_rate shall be 250 (indicating a leak rate of 100,000 bps)

sb_size shall be 1024 (indicating a smoothing buffer size of 1024 bytes)

Note that the smoothing buffer descriptor is referred to here to describe the STD model for the program guide, and does not imply that a smoothing buffer descriptor for the program guide is to be included in the PMT.

1. System information

a) System information PID

At the option of broadcasters, certain system information may be transmitted in the transport stream. If present, the system information data stream shall be transported in PID 0x1FFC. This PID shall be reserved exclusively for the system information. The system information shall be formatted according to the structure and syntax described in the standard developed from ATSC document T3/S8-079, "System Information for Digital Television". Constraints applying to specific transmission media are given in that standard.

a) System information STD model

The system information bit stream shall adhere to an STD model that can be described by an MPEG smoothing buffer descriptor (Section 2.6.30 in ISO/IEC 13818-1) with the following constraints:

sb_leak_rate shall be 50 (indicating a leak rate of 20,000 bps)

sb_size shall be 1024 (indicating a smoothing buffer size of 1024 bytes)

Note that the smoothing buffer descriptor is referred to here to describe the STD model for the system information, and does not imply that a smoothing buffer descriptor for the system information is to be included in the PMT.

1. Specification of private data services

Private data provides a means to add new ancillary services to the basic digital television service specified in this standard. Private data is supported in two bit stream locations.

1. Private data can be transmitted within the adaptation header of transport packets (Sections 2.4.3.4 and 2.4.3.5 of ISO/IEC 13818-1).

2. Private data can be transmitted as a separate transport stream with its own
PID. The contents can be identified as being ATSC private by using the private_data_indicator_descriptor (Section 2.6.29 of ISO/IEC 13818-1) within the PMT.

In either case, it is necessary that the standards which specify the characteristics of such private_streams be consistent with the Digital Television Standard. Standards for private_streams shall precisely specify the semantics of the transmitted syntax as described in Sections 5.6.3.1 and 5.6.3.1.1.

a) Verification model

The standard shall be specified in terms of a verification model by defining the characteristics of the transmitted syntax and an idealized decoder. In ISO/IEC 13818-1 and 13818-2, this is accomplished by using the T-STD and VBV models, respectively. The elements required for specification by this Standard are described in the following Sections.

(1) Syntax and semantics

The syntax and semantics of the transmitted bit stream that implements the ancillary service shall be completely and unambiguously specified. The decoding process shall also be completely and unambiguously specified.

(1) Ancillary service target decoder (ASTD)

An idealized decoder model must be precisely defined for the service. Figure 2 introduces a concrete model for pedagogic purposes. It is modeled after the T-STD.

The salient features of the model are the size of the transport demultiplexing buffer (TB), the minimum transfer rate out of the transport demultiplex buffer (Rleak), the required System buffering (BSsys), and optionally the partitioning of BSsys between the smoothing portion and the decoder portion. The decoding process, represented as the decoding times T_decode(i), must be completely specified. The behavior of the BSsys buffer must be completely modeled with respect to its input process and its output process. Certain parameters of the service such as bit rate, etc., should also be specified.

a) Stream type and PMT descriptors

A new ancillary service shall be described as a program or elementary stream through documented Program Specific Information.

Figure 2. Ancillary service target decoder.

(1) Stream type

Several identifiers that are part of the transport section of the Digital Television Standard may be used to identify either the signal or constituent parts thereof; however, the fundamental identifier is the User Private stream type. The stream_type codes shall be unambiguously assigned within the range 0x80 to 0xAF. 0x81 has already been assigned within the Digital Television Standard (see Section 5.7.1).

(1) PMT descriptors

The Ancillary Service specification shall include all pertinent descriptors that are found within the Program Map Table. Specifically, it is recommended that either the private_stream_identifier or the registration_descriptor, or both, be included. Although this is not required for a stream with a unique stream_type code within this Standard, it will enhance interoperability in the case where the stream is stored outside this Standard, or transmitted in some other network that has its own set of stream_type codes.

A. Assignment of identifiers

In this Section, those Identifiers and codes which shall have a fixed value are summarized. These include PES Stream IDs and Descriptors. Stream_type codes from 0x80 to 0xAF shall be reserved for assignment as needed within the Digital Television Standard. Descriptor_tag codes from 0x40 to 0xAF shall be reserved for assignment as needed within the Digital Television Standard.

1. Stream type

The AC-3 audio stream_type shall have the value 0x81.

1. Descriptors

a) AC-3 audio descriptor

In the digital television system the AC-3 audio descriptor shall be included in the TS_program_map_section. The syntax is given in Table 2 of Annex A of ATSC Standard A/52. There are the following constraints on the AC-3 audio descriptor:

The value of the descriptor_tag shall be 0x81.

If textlen exists, it shall have a value of '0x00'.

a) Program smoothing buffer descriptor.

The Program Map Table of each program shall contain a smoothing buffer descriptor pertaining to that program in accordance with Section 2.6.30 of ISO/IEC 13818-1. During the continuous existence of a program, the value of the elements of the smoothing buffer descriptor shall not change.

The fields of the smoothing buffer descriptor shall meet the following constraints:

The field sb_leak_rate shall be allowed to range up to the maximum transport rates specified in Section 7.2.

The field sb_size shall have a value less than or equal to 2048. The size of the smoothing buffer is thus 2048 bytes.

A. Extensions to the MPEG-2 Systems specification

This Section covers extensions to the MPEG-2 Systems specification.

1. Scrambling control

The scrambling control field within the packet header allows all states to exist in the digital television system as defined in Table 2.

Table 2 Transport Scrambling Control Field

Elementary Streams for which the transport_scrambling_control field does not exclusively have the value of '00' for the duration of the program, must carry a CA_descriptor in accordance with Section 2.6.16 of ISO/IEC 13818-1.

The implementation of a digital television delivery system that employs conditional access will require the specification of additional data streams and system constraints.

I. Features of 13818-1 not supported by this Standard

The transport definition is based on the MPEG-2 Systems standard, ISO/IEC 13818-1; however, it does not implement all parts of the standard. This Section describes those elements which are omitted from this Standard.

A. Program streams

This Standard does not include those portions of ISO/IEC 13818-1 and Annex A of ATSC Standard A/52 which pertain exclusively to Program Stream specifications.

A. Still pictures

This Standard does not include those portions of ISO/IEC 13818-1 Transport Stream specification which pertain to the Still Picture model.

I. Transport encoder interfaces and bit rates

A. Transport encoder input characteristics

The MPEG-2 Systems standard specifies the inputs to the transport system as MPEG-2 elementary streams. It is also possible that systems will be implemented wherein the process of forming PES packets takes place within the video, audio or other data encoders. In such cases, the inputs to the Transport system would be PES packets. Physical interfaces for these inputs (elementary streams and/or PES packets) may be defined as voluntary industry standards by SMPTE or other standardizing organizations.

A. Transport output characteristics

Conceptually, the output from the transport system is a continuous MPEG-2 transport stream as defined in this Annex at a constant rate of Tr Mbps when transmitted in an 8 VSB system and 2Tr when transmitted in a 16 VSB system where:

and

is the symbol rate Sr in Msymbols per second for the transmission subsystem (see Section 4.1 of Annex D). Tr and Sr shall be locked to each other in frequency.

All transport streams conforming to this Standard shall conform to the ISO/IEC 13818-1 model.

Details of the interface for this output, including its physical characteristics, may be defined as a voluntary industry standard by SMPTE , or other standardizing organizations.

Annex D

(Normative)

RF/Transmission Systems Characteristics

I. Scope

This Annex describes the characteristics of the RF/Transmission subsystem, which is referred to as the VSB subsystem, of the Digital Television Standard. The VSB subsystem offers two modes: a terrestrial broadcast mode (8 VSB), and a high data rate mode (16 VSB). These are described in separate sections of this document.

I. Normative references

There are no Normative References.

I. Compliance notation

As used in this document, "shall" or "will" denotes a mandatory provision of the standard. "Should" denotes a provision that is recommended but not mandatory. "May" denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor.

I. Transmission characteristics for terrestrial broadcast

A. Overview

The terrestrial broadcast mode (known as 8 VSB) will support a payload data rate of 19.28... Mbps in a 6 MHz channel. A functional block diagram of a representative 8 VSB terrestrial broadcast transmitter is shown in Figure 1. The input to the transmission subsystem from the transport subsystem is a 19.39... Mbps serial data stream comprised of 188-byte MPEG-compatible data packets (including a sync byte and 187 bytes of data which represent a payload data rate of 19.28... Mbps).

The incoming data is randomized and then processed for forward error correction (FEC) in the form of Reed-Solomon (RS) coding (20 RS parity bytes are added to each packet), 1/6 data field interleaving and 2/3 rate trellis coding. The randomization and FEC processes are not applied to the sync byte of the transport packet, which is represented in transmission by a Data Segment Sync signal as described below. Following randomization and forward error correction processing, the data packets are formatted into Data Frames for transmission and Data Segment Sync and Data Field Sync are added.

Figure 2 shows how the data are organized for transmission. Each Data Frame consists of two Data Fields, each containing 313 Data Segments. The first Data Segment of each Data Field is a unique synchronizing signal (Data Field Sync) and includes the training sequence used by the equalizer in the receiver. The remaining 312 Data Segments each carry the equivalent of the data from one 188-byte transport packet plus its associated FEC overhead. The actual data in each Data Segment comes from several transport packets because of data interleaving. Each Data Segment consists of 832 symbols. The first 4 symbols are transmitted in binary form and provide segment synchronization. This Data Segment Sync signal also represents the sync byte of the 188-byte MPEG-compatible transport packet. The remaining 828 symbols of each Data Segment carry data equivalent to the remaining 187 bytes of a transport packet and its associated FEC overhead. These 828 symbols are transmitted as 8-level signals and therefore carry three bits per symbol. Thus, 828 x 3 = 2484 bits of data are carried in each Data Segment, which exactly matches the requirement to send a protected transport packet:

Figure 1. VSB transmitter.

Figure 2. VSB data frame.

187 data bytes + 20 RS parity bytes = 207 bytes
207 bytes x 8 bits/byte = 1656 bits
2/3 rate trellis coding requires 3/2 x 1656 bits = 2484 bits.

The exact symbol rate is given by equation 1 below:

(1) Sr (MHz) = 4.5/286 x 684 = 10.76... MHz

The frequency of a Data Segment is given in equation 2 below:

(2) fseg= Sr / 832 = 12.94... X 103 Data Segments/s.

The Data Frame rate is given by equation (3) below:

(3) fframe = fseg/626 = 20.66 ... frames/s.

The symbol rate Sr and the transport rate Tr (see Section 7.2 of Annex C) shall be locked to each other in frequency.

The 8-level symbols combined with the binary Data Segment Sync and Data Field Sync signals shall be used to suppressed-carrier modulate a single carrier. Before transmission, however, most of the lower sideband shall be removed. The resulting spectrum is flat, except for the band edges where a nominal square root raised cosine response results in 620 kHz transition regions. The nominal VSB transmission spectrum is shown in Figure 3.

At the suppressed-carrier frequency, 310 kHz from the lower band edge, a small pilot shall be added to the signal.

Figure 3. VSB channel occupancy (nominal).

A. Channel error protection and synchronization

1. Prioritization

All payload data shall be carried with the same priority.

1. Data randomizer

A data randomizer shall be used on all input data to randomize the data payload (not including Data Field Sync or Data Segment Sync, or RS parity bytes). The data randomizer XORs all the incoming data bytes with a 16-bit maximum length pseudo random binary sequence (PRBS) which is initialized at the beginning of the Data Field. The PRBS is generated in a 16-bit shift register that has 9 feedback taps. Eight of the shift register outputs are selected as the fixed randomizing byte, where each bit from this byte is used to individually XOR the corresponding input data bit. The data bits are XORed MSB to MSB ... LSB to LSB.

The randomizer generator polynomial is as follows:

G(16) = X16 + X13 + X12 + X11 + X7 + X6 + X3 + X + 1

The initialization (pre-load) to F180 hex (load to 1) occurs during the Data Segment Sync interval prior to the first Data Segment.

The randomizer generator polynomial and initialization is shown in Figure 4.

Figure 4. Randomizer polynomial.

1. Reed-Solomon encoder

The RS code used in the VSB transmission subsystem shall be a t = l0 (207,187) code. The RS data block size is 187 bytes, with 20 RS parity bytes added for error correction. A total RS block size of 207 bytes is transmitted per Data Segment.

In creating bytes from the serial bit stream, the MSB shall be the first serial bit. The 20 RS parity bytes shall be sent at the end of the Data Segment. The parity generator polynomial and the primitive field generator polynomial are shown in Figure 5.

1. Interleaving

The interleaver employed in the VSB transmission system shall be a 52 data segment (intersegment) convolutional byte interleaver. Interleaving is provided to a depth of about 1/6 of a data field (4 ms deep). Only data bytes shall be interleaved. The interleaver shall be synchronized to the first data byte of the data field. Intrasegment interleaving is also performed for the benefit of the trellis coding process.

The convolutional interleaver is shown in Figure 6.

Figure 5. Reed-Solomon (207,187) t=10 parity generator polynomial.

Figure 6. Convolutional interleaver (byte shift register illustration).

1. Trellis coding

The 8 VSB transmission sub-system shall employ a 2/3 rate (R=2/3) trellis code (with one unencoded bit which is precoded). That is, one input bit is encoded into two output bits using a l/2 rate convolutional code while the other input bit is precoded. The signaling waveform used with the trellis code is an 8-level (3 bit) one-dimensional constellation. The transmitted signal is referred to as 8 VSB. A 4-state trellis encoder shall be used.

Trellis code intrasegment interleaving shall be used. This uses twelve identical trellis encoders
and precoders operating on interleaved data symbols. The code interleaving is accomplished by encoding symbols (0, 12, 24, 36 ...) as one group, symbols (1, 13, 25, 37, ...) as a second group, symbols (2, 14, 26, 38, ...) as a third group, and so on for a total of 12 groups.

In creating serial bits from parallel bytes, the MSB shall be sent out first: (7, 6, 5, 4, 3, 2, 1, 0). The MSB is precoded (7, 5, 3, 1) and the LSB is feedback convolutional encoded (6, 4, 2, 0). Standard 4-state optimal Ungerboeck codes shall be used for the encoding. The trellis code utilizes the 4-state feedback encoder shown in Figure 7. Also shown is the precoder and the symbol mapper. The trellis code and precoder intrasegment interleaver is shown in Figure 8 which feeds the mapper shown in Figure 7. Referring to Figure 8, data bytes are fed from the byte interleaver to the trellis coder and precoder, and they are processed as whole bytes by each of the twelve encoders. Each byte produces four symbols from a single encoder.

Figure 7. 8 VSB trellis encoder, precoder, and symbol mapper.

The output multiplexer shown in Figure 8 shall advance by four symbols on each segment boundary. However, the state of the trellis encoder shall not be advanced. The data coming out of the multiplexer shall follow normal ordering from encoder 0 through 11 for the first segment of the frame, but on the second segment the order changes and symbols are read from encoders 4 through 11, and then 0 through 3. The third segment reads from encoder 8 through 11 and then 0 through 7. This three-segment pattern shall repeat through the 312 Data Segments of the frame. Table 1 shows the interleaving sequence for the first three Data Segments of the frame.

After the Data Segment Sync is inserted, the ordering of the data symbols is such that symbols from each encoder occur at a spacing of twelve symbols.

Figure 8. Trellis code interleaver.

Table 1 Interleaving Sequence

A complete conversion of parallel bytes to serial bits needs 828 bytes to produce 6624 bits. Data symbols are created from 2 bits sent in MSB order, so a complete conversion operation yields 3312 data symbols, which corresponds to 4 segments of 828 data symbols. 3312 data symbols divided by 12 trellis encoders gives 276 symbols per trellis encoder. 276 symbols divided by 4 symbols per byte gives 69 bytes per trellis encoder.

The conversion starts with the first segment of the field and proceeds with groups of 4 segments until the end of the field. 312 segments per field divided by 4 gives 78 conversion operations per field.

During segment sync the input to 4 encoders is skipped and the encoders cycle with no input. The input is held until the next multiplex cycle and then fed to the correct encoder.

Table 2 details the byte to symbol conversion and the associated multiplexing of the trellis encoders. Segment 0 is the first segment of the field. The pattern repeats every 12 segments; segments 5 through 11 are not shown.

Table 2 Byte to Symbol Conversion, Multiplexing of Trellis Encoders

1. Data segment sync

The encoded trellis data shall be passed through a multiplexer that inserts the various synchronization signals (Data Segment Sync and Data Field Sync).

A two-level (binary) 4-symbol Data Segment Sync shall be inserted into the 8-level digital data stream at the beginning of each Data Segment. (The MPEG sync byte shall be replaced by Data Segment Sync.) The Data Segment Sync embedded in random data is illustrated in Figure 9.

A complete segment shall consist of 832 symbols: 4 symbols for Data Segment Sync, and 828 data plus parity symbols. The Data Segment Sync is binary (2-level). The same sync pattern occurs regularly at 77.3 ms intervals, and is the only signal repeating at this rate. Unlike the data, the four symbols for Data Segment Sync are not Reed-Solomon or trellis encoded, nor are they interleaved. The Data Segment Sync pattern shall be a 1001 pattern, as shown in Figure 9.

Figure 9. 8 VSB data segment.

1. Data field sync

The data are not only divided into Data Segments, but also into Data Fields, each consisting of 313 segments. Each Data Field (24.2 ms) shall start with one complete Data Segment of Data Field Sync, as shown in Figure 10. Each symbol represents one bit of data (2-level). The 832 symbols in this segment are defined below. Refer to Figure 10.

Figure 10. VSB data field sync.

a) Sync

This corresponds to Data Segment Sync and is defined as 1001.

a) PN511

This pseudo-random sequence is defined as X9 + X7 + X6 + X4 + X3 + X + 1 with a pre-load value of 010000000. The sequence is:

a) PN63

This pseudo-random sequence is repeated three times. It is defined as X6 + X + 1 with a pre-load value of 100111. The middle PN63 is inverted on every other Data Field Sync. The sequence is:

The generators for the PN63 and PN511 sequences are shown in Figure 11.

Figure 11. Field sync PN sequence generators.

a) VSB mode

These 24 bits determine the VSB mode for the data in the frame. The first two bytes are reserved. The suggested fill pattern is 0000 1111 0000 1111. The next byte is defined as:

where P is the even parity bit, the MSB of the byte, and A,B, C are the actual mode bits.

* In the 8 VSB mode, the preceding bits are defined as :

a) Reserved

The last 104 bits shall be reserved space. It is suggested that this be filled with a continuation of the PN63 sequence. In the 8 VSB mode, 92 bits are reserved followed by the 12 symbol definition below.

a) Precode

In the 8 VSB mode, the last 12 symbols of the segment shall correspond to the last 12 symbols of the previous segment.

All sequences are pre-loaded before the beginning of the Data Field Sync.

Like the Data Segment Sync, the Data Field Sync is not Reed-Solomon or trellis encoded, nor is it interleaved.

A. Modulation

1. Bit-to-symbol mapping

Figure 7 shows the mapping of the outputs of the trellis decoder to the nominal signal levels of (-7, -5, -3, -1, 1, 3, 5, 7). As shown in Figure 9, the nominal levels of Data Segment Sync and Data Field Sync are -5 and +5. The value of 1.25 is added to all these nominal levels after the bit-to-symbol mapping function for the purpose of creating a small pilot carrier.

1. Pilot addition

A small in-phase pilot shall be added to the data signal. The frequency of the pilot shall be the same as the suppressed-carrier frequency as shown in Figure 3. This may be generated in the following manner. A small (digital) DC level (1.25) shall be added to every symbol (data and sync) of the digital baseband data plus sync signal (+l, +3, +5,+7). The power of the pilot shall be 11.3 dB below the average data signal power.

1. 8 VSB modulation method

The VSB modulator receives the 10.76 Msymbols/s, 8-level trellis encoded composite data signal (pilot and sync added). The ATV system performance is based on a linear phase raised cosine Nyquist filter response in the concatenated transmitter and receiver, as shown in Figure 12. The system filter response is essentially flat across the entire band, except for the transition regions at each end of the band. Nominally, the roll-off in the transmitter shall have the response of a linear phase root raised cosine filter.

I. Transmission characteristics for high data rate mode

A. Overview

The high data rate mode trades off transmission robustness (28.3 dB signal-to-noise threshold) for payload data rate (38.57 Mbps). Most parts of the high data rate mode VSB system are identical or similar to the terrestrial system. A pilot, Data Segment Sync, and Data Field Sync are all used to provide robust operation. The pilot in the high data rate mode also is 11.3 dB below the data signal power. The symbol, segment, and field signals and rates are all the same, allowing either receiver to lock up on the other's transmitted signal. Also, the data frame definitions are identical. The primary difference is the number of transmitted levels (8 versus 16) and the use of trellis coding and NTSC interference rejection filtering in the terrestrial system.

Figure 12. Nominal VSB system channel response
(linear phase raised cosine Nyquist filter).

The RF spectrum of the high data rate modem transmitter looks identical to the terrestrial system, as illustrated in Figure 3. Figure 13 illustrates a typical data segment, where the number of data levels is seen to be 16 due to the doubled data rate. Each portion of 828 data symbols represents 187 data bytes and 20 Reed-Solomon bytes followed by a second group of 187 data bytes and 20 Reed-Solomon bytes (before convolutional interleaving).

Figure 13. 16 VSB data segment.

Figure 14 shows the block diagram of the transmitter. It is identical to the terrestrial VSB system except the trellis coding shall be replaced with a mapper which converts data to multi-level symbols. See Figure 15.

Figure 14. 16 VSB transmitter.

Figure 15. 16 VSB mapper.

A. Channel error protection and synchronization

1. Prioritization

See Section 4.2.1.

1. Data randomizer

See Section 4.2.2.

1. Reed-Solomon encoder

See Section 4.2.3.

1. Interleaving

The interleaver shall be a 26 data segment inter-segment convolutional byte interleaver. Interleaving is provided to a depth of about 1/12 of a data field (2 ms deep). Only data bytes shall be interleaved.

1. Data segment sync

See Section 4.2.6.

1. Data field sync

See Section 4.2.7.

A. Modulation

1. Bit-to-symbol mapping

Figure 15 shows the mapping of the outputs of the interleaver to the nominal signal levels (-15, -13, -11, ..., 11, 13, 15). As shown in Figure 13, the nominal levels of Data Segment Sync and Data Field Sync are -9 and +9. The value of 2.5 is added to all these nominal levels after the bit-to-symbol mapping for the purpose of creating a small pilot carrier.

1. Pilot addition

A small in-phase pilot shall be added to the data signal. The frequency of the pilot shall be the same as the suppressed-carrier frequency as shown in Figure 3. This may be generated in the following manner. A small (digital) DC level (2.5) shall be added to every symbol (data and sync) of the digital baseband data plus sync signal (+l, +3, +5, +7, +9, +11, +13, +15). The power of the pilot shall be 11.3 dB below the average data signal power.

1. 16 VSB modulation method

The modulation method shall be identical to that in Section 4, except the number of transmitted levels shall be 16 instead of 8.

Annex E

(Informative)

Receiver Characteristics

I. Scope

This informative Annex provides material to help readers understand and implement the normative portions of the Digital Television Standard. The normative clauses of the Standard do not specify the design of a receiver. Instead, they specify the transmitted bit stream and RF signal with a thoroughness sufficient to permit the design of a receiver.

Although the normative portions of the Standard are written in the traditional way – by specifying the signal format, not the receiver – the ATSC believes that the introductory phase of this new Standard can be made more orderly by listing some receiver design considerations in this informative Annex. Service providers need assurance that their programs will be correctly processed in all receivers, and receiver manufacturers need assurance that their receivers will function properly with all broadcasts.

This Annex also contains references to existing (both voluntary and mandatory) standards for television receivers and notes work in progress on voluntary industry standards being developed at this time.

I. References to existing or emerging standards

47 CFR Part 15, FCC Rules.

EIA IS-132, EIA Interim Standard for Channelization of Cable Television.

EIA IS-23, EIA Interim Standard for RF Interface Specification for Television Receiving Devices and Cable Television Systems.

EIA IS-105, EIA Interim Standard for a Decoder Interface Specification for Television Receiving Devices and Cable Television Decoders.

I. Compliance notation

Compliance with mandatory or voluntary standards and recommended practices for digital television receivers can be inferred only from previous experience with NTSC. Actual standards for digital television receivers have not been developed at this time. As used in this document "appropriate" means that the existing rules for NTSC which are referenced contain most elements of future rules for digital television. Furthermore, the rules may be expanded to cover digital television.

I. Status of receiver standardization activities

A. Tuner performance

The FCC Rules under 47 CFR Part 15 which are applicable to conventional television receivers are expected to be appropriate for digital television receivers.

1. Noise figure

The 10 dB noise figure used as a planning factor has been reviewed considering the needs of digital television reception and has been found appropriate.

1. Channelization plan for broadcast and cable

The cable channelization plan specified in the FCC Rules under 47 CFR Part 15 which are applicable to conventional television receivers are expected to be appropriate for digital television receivers. Broadcast channelization is specified in the FCC Rules under 47 CFR Part 73.

1. Direct pickup

The FCC Rules under 47 CFR Part 15 which are applicable to conventional television receivers may be appropriate for digital television receivers, as well. Performance characteristics for reception of digital signals, whether standard or high definition, have not been developed by the industry. It is expected that direct pickup of a given level will have less effect on digital signals than on NTSC.

A. Transport

Significant work for identification of multiple programs within a single digital television channel has not taken place in the industry. It is recommended that a digital television receiver provide appropriate features to assist users in the selection of the desired video program service, if multiple video programs within one channel are offered.

A. Decoder interface

The FCC Rules which are to be adopted for a decoder interface on NTSC receiver advertised as "cable-ready" or "cable-compatible" are expected to be appropriate for digital television receivers. Much work has been done on this interface standard (IS-105) by the Joint Engineering Committee of EIA and NCTA. Although that interface standard is not intended to apply to digital television receivers, it will almost certainly provide a basis for a decoder interface standard applicable to them.

A. Digital data interface

Work on a digital data interface is being performed by the EIA's R-4.1 subcommittee on ATV Receiver Interfaces. R-4.1 intends to define a baseband serial digital interface so that devices may exchange packetized data, for example, when a digital VCR is connected to a digital television receiver.

It is recommended that manufacturers of digital television receivers wishing to include a digital data interface give consideration to the interface developed by R4.1.

A. Conditional access interface

The National Renewable Security System (NRSS) Subcommittee of the Joint Engineering Committee of EIA and NCTA has the responsibility to develop a standard for a plug-in security module. The NRSS standard may be applied in either a standard definition or high definition environment.

It is recommended that manufacturers of digital television receivers wishing to include a conditional access interface give consideration to the NRSS standard developed by the JEC.

A. Closed captioning

Closed captioning for television is covered by the FCC Rules under 47 CFR Part 15 which are presently applied to conventional television receivers. These rules are expected to be appropriate for digital television receivers.

Work on defining the technical standard for closed captioning for the digital television system is being performed by the EIA's R-4.3 subcommittee.

I. Receiver functionality

A. Video

It is recommended that a digital television receiver be capable of appropriately decoding and displaying the video scanning formats defined in the Digital Television Standard and described in Table 3 "Compression Format Constraints" in Annex A of this Standard.

A. Audio

It is recommended that a digital television receiver be capable of selecting and decoding any audio service described in Section 6 of Annex B of this Standard, subject to the bit rate constraints in Section 5.3 of Annex B of this Standard.

It is recommended that a digital television receiver be capable of normalizing audio levels based on the value of the syntactical element dialnorm which is contained in the audio elementary stream.

It is recommended that a digital television receiver be capable of altering reproduced audio levels based on the value of the syntactical element dynrng which is contained in the audio elementary stream.

It is recommended that a digital television receiver provide appropriate features to assist users in the selection of program related audio services.

NOTE: The user's attention is called to the possibility that compliance with this standard may require use of an invention covered by patent rights. By publication of this standard, no position is taken with respect to the validity of this claim, or of any patent rights in connection therewith. The patent holder has, however, filed a statement of willingness to grant a license under these rights on reasonable and nondiscriminatory terms and conditions to applicants desiring to obtain such a license. Details may be obtained from the publisher.

1 The JCIC is presently composed of: the Electronic Industries Association (EIA), the Institute of Electrical and Electronics Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE).

2 FCC 92-438, MM Docket No. 87-268, "Memorandum Opinion and Order/Third Report and Order/Third Further Notice of Proposed Rule Making," Adopted: September 17, 1992, pp. 59-60.

3 ITU-R Document TG11/3-2, "Outline of Work for Task Group 11/3, Digital Terrestrial Television Broadcasting," June 30, 1992.

4 Chairman, ITU-R Task Group 11/3, "Report of the Second Meeting of ITU-R Task Group 11/3, Geneva, October 13-19, 1993," January 5, 1994, p. 40.

5 See ISO/IEC 13818-2, Section 8 for more information regarding profiles and levels.

6 Note that 1088 lines are actually coded in order to satisfy the MPEG-2 requirement that the coded vertical size be a multiple of 16 (progressive scan) or 32 (interlaced scan).

7 At some point in the future, the color gamut may be extended by allowing negative values of RGB and defining the transfer characteristics for negative RGB values.

8 In order to decode the user data, the decoder should properly recognize the 32-bit ATSC registration identifier at the PSI stream level (see ISO/IEC 13818-1).

9 Shaded cells in this table indicate syntactic and semantic additions to the ISO/IEC 13818-2 standard.

10 Syntax and semantics to be specified by EIA.

11 EIA, Recommended Practice for Advanced Television Closed Captioning, draft, July 1, 1994.