Yeah personally I do think it has to be a pipeline, and therefore add upto a frame of latency, otherwise the tensor or CUDA cores would be dark for roughly half the frame (IE sitting idle doing nothing but waiting for more data, which is a waste of time & energy) while the time allocated for each process would be a fraction of the frametime, and if they were in series rather than concurrent then high framerates wouldn't impact DLSS ability to work as frametimes would just increase, whereas with a pipelined approach they need the Tensor cores to complete before the next frame is rendered, which explains the switch around 60Hz.
Since it doesn't need a full frame to start work though presumably there's a fair bit of overlap so it rarely actually reaches a full frame of extra latency.