Indian Microprocessor: OpenSPARC T1 and T2 Processor Implementation

CMT

As discussed previously, OpenSPARC gives preference to parallel computation rather than long pipelined instructions. Here is a diagram to summarize the concept

Most of the time, the processor will be stalled waiting on the memory or I/O operations.

OpenSPARC T1 OVERVIEW AND COMPONENTS

OVERVIEW:

OpenSPARC T1 consists of eight physical cores, each core having hardware support for 4 threads (strands or virtual processor). The four strands run simultaneously, with the instructions from the four strands being executed in a round robin fashion. If a single strand gets stuck with a long latency event, then the round robin continues for the remaining 3 strands till the 4th strand becomes available again.

It has a 64 entry, fully associative TLB which is shared by the 4 strands. The 8 cores are connected to the 12-way associative L2 cache through CCX (Cache Crossbar Switch). You can see in the diagram that 4 lines are entering the J-Bus and 8 are connected to the 4 DRAM control channels which are connected to DDR2-SDRAM.

COMPONENTS

OpenSPARC T1 Physical Core

Each core has support for 4 threads. They have a register file per strand. Each register file has 8 register windows. Most of the ASI (Address Space Identifier), ASR (Ancillary State Register) and privileged registers are replicated per strand.

The 4 strands share the instruction and data caches and TLBs. Multiple strands can update the TLB without locking using the "Autodemap" feature.

Six stages in core-pipeline

1. Fetch
2. Switch
3. Decode
4. Execute
5. Memory
6. Writeback

Switch stage consists of strand instruction register for each strand. Strand scheduler picks up a strand and the current instruction for that strand is passed into the pipe. Simultaneously, the hardware fetches the next instruction for that strand.

It is then decoded in the Decode stage. The register file access occurs at the same time.

In the Execution stage, all arithmetic and logical operations take place. Memory address is also calculated at this stage. The data cache is accessed in the memory stage and the instruction is committed in the Writeback stage. All traps are signaled in this stage.

As was mentioned in the previous post, round robin scheduler is used by the scheduler to assign instruction to a strand. If a long latency instruction is received by a strand, it is not scheduled till the time it is not completed.

A single FPU is shared amongst all eight cores.

L2 cache

Its banked 4 ways

DRAM Controller

Each L2 cache bank interacts with exactly one DRAM controller. The CMP frequency must be 4xDRAM controller frequency.

I/O Bridge Unit

The IOB performs an address decode on I/O addressable transactions and directs them to the appropriate internal block or to the appropriate external interface.

J-Bus is the interconnect between the T1 and the I/O subsystem.

Clock and Test Unit

CTU contains the clock generation, reset and JTAG circuitry.

T1 has a single PLL which takes the J-Bus clock as its reference, where the PLL output is divided down to generate the CMP clock.

Summary of Differences between T1 and T2:

I MICRO-ARCHITECTURAL DIFFERENCES

OpenSPARC T2 has two integer execution pipelines whereas T1 has one.

Each physical core in T2 has 8 strands which are divided into 4 strands each sharing a single integer pipeline. Whereas T1 core has 4 strands sharing a single integer pipeline.

Indian Microprocessor

Saturday, July 24, 2010

OpenSPARC T1 and T2 Processor Implementation

No comments:

Post a Comment

Followers

Blog Archive

About Me