As discussed previously, OpenSPARC gives preference to parallel computation rather than long pipelined instructions. Here is a diagram to summarize the concept
data:image/s3,"s3://crabby-images/a0e2f/a0e2f0af17863c5c09071871afea571e5d294c41" alt=""
Most of the time, the processor will be stalled waiting on the memory or I/O operations.
OpenSPARC T1 OVERVIEW AND COMPONENTS
OVERVIEW:
OpenSPARC T1 consists of eight physical cores, each core having hardware support for 4 threads (strands or virtual processor). The four strands run simultaneously, with the instructions from the four strands being executed in a round robin fashion. If a single strand gets stuck with a long latency event, then the round robin continues for the remaining 3 strands till the 4th strand becomes available again.
data:image/s3,"s3://crabby-images/b42d0/b42d05c88d67fca3bb20e163b93e2cc135decd0f" alt=""
It has a 64 entry, fully associative TLB which is shared by the 4 strands. The 8 cores are connected to the 12-way associative L2 cache through CCX (Cache Crossbar Switch). You can see in the diagram that 4 lines are entering the J-Bus and 8 are connected to the 4 DRAM control channels which are connected to DDR2-SDRAM.
COMPONENTS
- OpenSPARC T1 Physical Core
Each core has support for 4 threads. They have a register file per strand. Each register file has 8 register windows. Most of the ASI (Address Space Identifier), ASR (Ancillary State Register) and privileged registers are replicated per strand.
The 4 strands share the instruction and data caches and TLBs. Multiple strands can update the TLB without locking using the "Autodemap" feature.
Six stages in core-pipeline
1. Fetch
2. Switch
3. Decode
4. Execute
5. Memory
6. Writeback
data:image/s3,"s3://crabby-images/f717e/f717e85af5196cf1f866d6110ea3190ddb4d6fea" alt=""
Switch stage consists of strand instruction register for each strand. Strand scheduler picks up a strand and the current instruction for that strand is passed into the pipe. Simultaneously, the hardware fetches the next instruction for that strand.
It is then decoded in the Decode stage. The register file access occurs at the same time.
In the Execution stage, all arithmetic and logical operations take place. Memory address is also calculated at this stage. The data cache is accessed in the memory stage and the instruction is committed in the Writeback stage. All traps are signaled in this stage.
As was mentioned in the previous post, round robin scheduler is used by the scheduler to assign instruction to a strand. If a long latency instruction is received by a strand, it is not scheduled till the time it is not completed.
- FPU
- L2 cache
- DRAM Controller
- I/O Bridge Unit
- J-Bus is the interconnect between the T1 and the I/O subsystem.
- Clock and Test Unit
T1 has a single PLL which takes the J-Bus clock as its reference, where the PLL output is divided down to generate the CMP clock.
Summary of Differences between T1 and T2:
I MICRO-ARCHITECTURAL DIFFERENCES
OpenSPARC T2 has two integer execution pipelines whereas T1 has one.
Each physical core in T2 has 8 strands which are divided into 4 strands each sharing a single integer pipeline. Whereas T1 core has 4 strands sharing a single integer pipeline.
No comments:
Post a Comment