This first one may be the solution to the quoted problem:
And here we go...While the 68k has the bus, the DSP and GPU can apparently run their own code, but they can not touch any of the other hardware, even on the same chip. (For instance, if the GPU wants to talk to the line buffers or even the video registers, even though they are on the same chip, it needs to acquire the bus --
Emboldened areas added by moderatorKskunk wrote:It's usually several cycles. You can't interrupt a 68K bus cycle in progress. (This is a 68K limitation.) The 68K bus cycle takes 8 Jag RISC cycles. So depending on your luck you could wait anywhere from 1 to 8 cycles. It's almost inevitable that the 68K has changed the current DRAM page, which makes it 6 to 13 cycles round trip.Tursi wrote:Even though they have higher bus priority and can steal the bus away from the 68k, they still lose a cycle doing so and it's noticeable in throughput.
This works out fine for simple 2D games, since the OP wants to lock up the bus while it works (the overhead is minimal) and if you're using the blitter, it's probably being activated at the end of a 68K bus cycle.
But once you're using the GPU and DSP, the 68K is a pretty bad idea. The DSP is also pretty difficult to use for all the same reasons. It locks up the bus for 6 cycles on reads, 12 for writes (because of the workaround for the write bug). It's better than the 68K because it can run code out of its SRAM, but it's often hard to find algorithms that do a ton of computation and almost no I/O. That's one reason the DSP is usually idle except for some sound synthesis.
Who do you know that can read netlists?Tursi wrote:While the 68k has the bus, the DSP and GPU can apparently run their own code, but they can not touch any of the other hardware, even on the same chip. (For instance, if the GPU wants to talk to the line buffers or even the video registers, even though they are on the same chip, it needs to acquire the bus -- at least according to my experimentation. Someone who can read the netlists may be able to confirm or deny that more accurately.)
Both the GPU and DSP have a dedicated 'local bus' that can be accessed without using the main bus. On the GPU, the local bus contains SRAM, GPU control regs (for interrupts, divider, matrix, etc), the blitter, and line buffer writes (everything mapped from F02-F0F). This helps it set up polygons when scan converting without needing the bus.
The line buffer feature only works on 32-bit writes (no reads or 16-bit access), but apparently does not disturb the main bus. I'm far away from my Jaguar right now so I can't test if this really works. The netlists imply it was designed to work this way. This feature might be useful for some kind of special effect but I'm not creative enough to think of any.
On the DSP, the local bus contains SRAM, DSP control regs, math table ROM and some DSP peripherals (everything mapped from F12-F1F). This means the DSP is able to do CD access (at least the serial kind) and audio playback without touching the main bus. Off-chip stuff like joystick access obviously requires external bus cycles, but so do the UART and timers.
This appears to not be so much something new as it is something perhaps overlooked by the above two having the conversation. From page 36 of the v8 Jaguar Tech ref manual:
To the GPU programmer the local RAM, local hardware registers, and external memory all appear in the same
address space. The GPU memory controller determines whether a transfer is local or external, and generates
the appropriate cycle. The only difference to the programmer is that only 32-bit transfers are possible within the
GPU local address space, whereas 8, 16, 32 or 64-bit transfers are permitted externally.
The local RAM sits on an internal GPU 32-bit bus. Also present on this bus are various GPU control registers,
and the Blitter control registers. When a GPU transfer occurs outside the local address space, a gateway
connects the local bus to the main bus. If a sixty-four bit transfer is requested, a special register is used for the
other half of the data.
The address space is organised as follows:
F02000 - F021FF graphics processor control registers
F02200 - F022FF Blitter registers
F02300 - F02FFF reserved
F03000 - F03FFF local RAM
F04000 - F0FFFF reserved
This local address space is also available to external devices via the I/O mechanism.
The GPU local bus can therefore perform transfers for three quite separate mechanisms. These are, in
decreasing order of priority:
- CPU I/O access
- Operand data transfer
- Instruction fetch