How to generate a GPMC_WAIT signal for SDRAM access?

Hello All,

I would like to use the 32MB SDRAM in the LogiBone as shared memory between the FPGA and the Cortex-A8. For the time being, I am just trying to access the SDRAM from the Cortex. This task requires a module that acts as a bridge between the GPMC and the SDRAM controller. One of the things I first noticed is that the GPMC_WAIT signal is missing in the LogiBone. This pin is responsible for flow control in the GPMC bus (critical when the read access time is not deterministic, as it happens with SDRAM devices) as well as sending an interrupt to the Cortex-A8. JPiat suggested in another post to use a preloaded buffer implemented in the FPGA, (this buffer would act as a cache memory). This design solves the access time issue, but it would require to implement some kind of cache coherency and flushing mechanism, and the design can get really complex.

If I am not mistaken, the GPMC_WAIT_0 pin corresponds to P9-11, and the pin next to it, P9-12 is the GPMC_BE1N, which I am not using because I always access the device using 16-bit words. My question is: can I wire the two pins together and use the GPMC_BE1N pin as GPMC_WAIT signal? I guess this requires modifying the GPMC pinout configuration in the device tree, so the AM3358 does not try to drive the BL1N signal. Would that do the trick? Any other ideas or suggestions are welcome.

Thanks in advance,
JC
Tagged:

Comments

  • Hi,

    this is a very good idea and it will clearly help with acessing the SDRAM from the Cortex. The option your propose would work, you would just need to cut the connector pin that connects  BE1N to the BBB and solder BE1N and GPMC_WAIT together on the LOGI-Bone. Another option is to use one of the pin of the Arduino connector and wire it to GPMC_WAIT. I haven't tried myself the gpmc_wait, and i'am not sure of how to configure it on the device-tree side.

    I'am trying myslef to get the SDRAM to work as a large FIFO that would allow to share large amount of data between the FPGA and the processor. The fifo style access makes things easier on cache coeherency management but may not work for your specific application. The first result i have shows that mixed write reads to the FIFO can allow pretty fast accesses (~60MB/s).
     Direct accesses to the SDRAM from the cortex would be nice but you will get poor transfer rates (at least on reads) as there is a pretty high access latency on single accesses.

    Keep us updated on your choices and progresses.

    Regards,

    Jonathan Piat
  • Hi Jonathan,

    I have tried different solutions, none of them successful so far. As you correctly pointed out on Github, crossing clock domains is a slow operation if we want to avoid metastability (specially on the way out, because the clock frequency is lower). Additionally, I need to adapt the data bandwidth between the GPMC and the SDRAM. Dual clock fifos have a significant delay (around 5 clock cycles), so I have ruled out that option. Clock gating on the SDRAM could be another option, but I need to extend the state machine to generate the enable signal.

    I also tried to select between sys_clk and gpmc_clk as SDRAM clock using an instance of BUFGMUX and gpmc_cen as control signal, but the routing tool does not allow me to do this. I am not entirely sure it can be done, but if we can clock the fpga using P85 and P95, there should be a way to switch dynamically between the two of them, or am I talking nonsense? I have read ug382 several times and I cannot get a clear picture of the clock network.

    I am still working on it. I will let you know.

    Thanks,
    JC
  • The problem is not only about crossing clock domains. The SDRAM access is inherently slow because of the initial adressing latency that is 11 clock cycles for four words. The burst mode of the SDRAM (back-to-back reads/writes) allow to achieve a better performance but is not possible with the GPMC. Accessing SDRAM through GPMC would result in the following :

    1) one gpmc clock cycle to set the address : 20ns (50Mhz GPMC clock)
    2) two system clock cycle for clock domain crossing : 20ns (assuming a 100 Mhz clock)
    3) 11 SDRAM clock cycle for SDRAM access : 110ns (assuming a 100Mhz SDRAM synchronous to system clock)
    4) one/two GPMC clock cycle to finalize the access : 40ns

    A single 16bit access would thus cost ~200ns. The SDRAM access being predictable, you don't even have to include wait state in the GPMC access, you can compute the read access time for the SDRAM and configure the GPMC accordingly.

    The operation you try to perform of the SDRAM clock (to select between sys_clk and gpmc_clk) is called clock gating, and it won't work because the syntehesize don't like clocks that goes from the clock routes to the logic routes. This is also not a good idea because the GPMC_CLK is inactive when no transfer occurs (the clock is active only during the access time).


    Do you really need shared memory ? I got my SDRAM-based FIFO to work last night and i can now have a 16MByte fifo in SDRAM with cache that allows fast access to the FIFO.

    If you really need SDRAM as a shared memory, i really think that you have to run with a small cache. Cache flush/refresh can be configured through the GPMC bus and would still have a better performance than adressing the SDRAM directly on the GPMC bus. It could work this way :

    1) create an architecture with a small (1KByte memory in the FPGA fabric), connected on one side to the wishbone bus
    2) create a cache controller connected on the wishbone bus composed of three registers : cache base address, cache status, cache control
    3) the cache controller would be in charge of the following :
         - trigger a read from the SDRAM starting from cache base address and write to the cache when the cache refresh bit is set in cache control. When done, activate the cache refreshed bit in the cache status register
        - trigger a write to the SDRAM starting from cache base address with data read fro mthe cache when the cache flush bit is set in cache control. When done activate the cache flushed bit in the cache status register

    Writing to the SDRAM would then simply be a matter of writing to the cache and trigger the flush to the configured address, doing a read would be performed by triggering a cache refresh and polling the cache refreshed bit in the cache status register. The SDRAM access speed would be optimal because teh cache controller would only perform bust access.
    Regards,

    Jonathan Piat


  • Hello Jonathan,

    I have followed your suggestion and I have instantiated a small cache (4KB for testing purposes) using block memory. I have not implemented the flushing mechanism yet, just accessing this small memory. It works fine except for I had to increase the read access time from 80ns to 100ns, otherwise I get some artifacts when I read the data back from the FPGA and compare to the original data. With 100ns, it works like a charm.

    I have used the same strategy file from the logi-wishbone project. Is this 100ns access time expected or should it be possible to read in 80ns? Simulation shows I can, but reality proves otherwise (read value '0x7878' is ready after 80 ns).

    Thank you,
    JC
    imageimage
    sim.JPG 116.2K
    sim.JPG
    1280 x 709 - 116K
  • Great !

    Concerning the GPMC timings, i have set them to be a bit conservative but stable. The data should be ready after 80ns as you noticed but there is an unknown phase relationship between the gpmc clock and the system clock in reality. I'am working on getting the data to the GPMC bus faster to allow faster access but for now i cannot get it under 80ns. The way i did it is by changing the way the data is brought back from the sysem clock domain to the gpmc clock domain. The repository version implements a dual flop synchronizer in both direction, but getting the data to the GPMC should not require any synchronize as the read value is stable during all the access.

    This can be done by the following change :

    readdata_bridge <= wbm_readdata;

    (and remove all the other parts of the code assigning readdata_bridge)


    Another way can be to change to :

    iob_readdata <= wbm_readdata

    to save one GPMC clock cycle.

Sign In or Register to comment.