Skip to content

Conversation

@errsu
Copy link

@errsu errsu commented Nov 20, 2025

As discussed in issue #35, here is the XTC 15.3.x and xcore-ai update for lib_sdram.
With timing calculation and test tool. Tested with XS2 and XS3 hardware.

I've ordered the commits from rather mandatory to rather optional.
Let me know how far you want to go. The sdram_timing tool in the last commit
seems funny as it allows to directly compare delay calculations and test results.

I could send some XS3 SDRAM testing hardware to XMOS if you like.

Yours,
Ralf

- tested with XTC 15.3.1
    on XP-SKC-X200 sliceboard (XS-2 CPU XE216-512-FB236)
    and XA-SK-SDRAM slice (64Mb IS42S16400F)

- .gitignore taken from other XMOS projects that already support
   XCommon CMake (mostly a superset of the previous version)

- to get it running, fixed test/sdram_multi_client
  + sdram_server called with client_count=7 according
    the the actual number of clients
  + run sdram_server in fast mode, otherwise there would be
    a short moment when enough clients (running in fast mode)
    are busy at the same time, so the sdram_server running in
    slow mode doesn't get enough cycles, resulting in errors

- cosmetic change in sdram_bench to actually use BUF_WORDS,
  so that all tests run with the same buffer size, which is not
  a multiple of the page size (which was probably the intention)
- as used to calculate the theoretical SDRAM read timings for xcore-ai
- with documentation of the signals and latencies in the comments
- for 600 MHz clock instead of the 500 MHz default of XS1/XS2
- tested with an XU316 board with 64Mbit Winbond W9864G6KH-6
  which is equivalent to an ISSI IS42S16400D-7
- depending on the clock divider, the READ/WRITE_SETUP_LATENCY
  could possibly be further optimized (reduced) for XS3 compared
  to XS2, as the thread cycles of XS3 are shorter by 17%,
  but this is relevant only for small buffer read/write commands
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds xcore-ai (XS3) support to lib_sdram, extending compatibility beyond XS2 (xCORE200) architecture. The changes include platform-specific timing parameter selection for the faster 600MHz XS3 core, a new timing calculation and test tool to validate SDRAM data input delays, and comprehensive updates to all examples and tests to support both XS2 and XS3 platforms.

Key changes:

  • New sdram_server_with_delays() API function for explicit timing control
  • Platform-specific default timing delays (600MHz for XS3, 500MHz for XS2)
  • New sdram_timing tool for calculating and validating read timing windows
  • Build system migration to CMake with target board selection

Reviewed Changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib_sdram/src/server.xc Refactored timing delay selection into select_default_delays(), added sdram_server_with_delays() API with explicit timing parameters
lib_sdram/src/io.S Updated assembly to support both XS2A and XS3A architectures
lib_sdram/api/sdram.h Added API documentation for new sdram_server_with_delays() function
tools/sdram_timing/* New timing calculation and testing tool with comprehensive documentation
tests/sdram_testbench/src/sdram_testbench.xc Added XS3 support, improved error reporting with counters, and verbose print helper
tests/sdram_multi_client_test/src/sdram_multi_client_test.xc Added XS3 port definitions and platform-specific clock dividers
tests/sdram_benchmark/src/sdram_bench.xc Added XS3 support and JTAG IO print mode handling
examples/sdram_demo/src/sdram_demo.xc Added XS3 platform support with appropriate clock dividers
CMakeLists.txt (multiple) New CMake build configuration files for all examples, tests, and tools
*.xn (multiple) New board definition files for xcore-ai SDRAM test board

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
if (VERBOSE_MSG)
printf("4 threaded test suite completed\n");
verbose_print("4 threaded test suite completed: %d\n");
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format string in this verbose_print call is missing arguments. The function expects "4 threaded test suite completed: %d\n" which requires a success argument, but none is provided. This will cause undefined behavior or a compilation error.

Suggested change
verbose_print("4 threaded test suite completed: %d\n");
printf("4 threaded test suite completed: %d\n", success);

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +47
static void select_default_delays(
const unsigned clock_divider,
unsigned& read_delay_whole_clocks,
unsigned& sample_delay,
unsigned& pad_delay) {
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function select_default_delays is missing documentation. Given that it plays a critical role in selecting timing parameters based on clock divider, it should have a comment explaining its purpose, parameters, and expected behavior.

Copilot uses AI. Check for mistakes.
@xross
Copy link
Contributor

xross commented Nov 21, 2025

Hi Ralf, thanks for the PR! We'll get someone assigned to look though the changes. HW offer would be great. We have something in the pipe but it's behind some other items at the moment. I sent you an email.

- XS2 tests were done with XP-SKC-X200 and SDRAM slice
  containing IS42S16400F-7, to check for regressions

- XS3 tests were done with a test board equipped with
  XU316-1024-FB265-C24 CPU and a 64 Mbit SDRAM
  Winbond W9864G6KH-6 (which is equivalent to IS42S16400D-7)

- the XS3 test board does not support xscope,
  so all tests were adapted to optinally work with JTAG I/O,

- to run tests with JTAG I/O, edit the respective config.xscope
  files to turn xscope off, set the #define JTAG_IO_PRINT (if any)
  to 1, and use xrun --io instead of xrun --xscope
- first of all, to avoid confusion
- this solves the potential issue that cas_latency
  is declared const static in the header and used
  as mutable variable in the implementation
- it will also be useful for later refactorings
- cas_latency parameter was not used in the write function
- this is an equivalent transform
- the delay selection function does not configure
  the actual delays, that is still done by sdram_init
- this means select_delays could actually be
  replaced by a table, indexed by the clock divider
  (maybe the compiler is doing just that)
- actually two tables, one for XS3 and one for XS1/XS2
- sdram_init is not concerned anymore about
  read_delay_whole_clocks
- by introducing a new function, sdram_server_with_delays
- it gets a none-const non-static clock divider
  and the three parameters which were before derived
  from this clock divider
- the original sdram_server is functionally unchanged
  and implemented in terms of the new function,
  using the now called select_default_delays function
- clock_divider is not used in time-critical code,
  all other const-static parameters are unchanged
  and therefore no new performance issues are created
- the new sdram_server_with_delays function has
  following use cases
  + the SDRAM read timing depends very much on the users
    situation - SDRAM IC selection, PCB line capacities,
    XMOS CPU pin selection, pin driver current selection,
    temperature ranges, supply voltage levels - therefore
    it is an advantage if the user can specify these
    parameters in the application without a need to modify
    the lib_sdram library
  + the function allows to write a tool that would test
    various settings to see which timing ranges actually
    do work, to verify the theoretical considerations
  + it could make it possibly to reach higher SDRAM speeds
    using an adaptive timing, which means the working
    timing range is measured before the actual SDRAM
    use or in idle phases, by trial-and-error with
    varying settings
- that tests read delay parameter combinations
  for clock dividers from 4 to 12 and prints
  out which work and which not
- does a single write/read test with fixed and walking 1 pattern
  and whole memory test with random pattern
- this allows to compare theoretical and
  practical results easily
- provide calculation and actual test results compared to
  default values chosen in lib_sdram/src/server.xc
- explicitely limit thread speed of sdram_server
  to FCore/8, e.g. 63 MIPS at 500 MHz
- this is tested to be still fast enough for all
  scenarios in the test, including 100MHz Tclk
  at 800MHz Tcore
- document the respective code
@errsu
Copy link
Author

errsu commented Nov 23, 2025

Sorry for force pushing, I've taken the useful Copilot suggestions into account.

- The calculation doesn't find a solution for SDRAM clock
  speeds higher than 57.14 MHz. The hardware actually works
  in a read time window of 5ns width at 100MHz Fclk.
  This is definitely too small for the variation of
  production units, but could possibly work with an adaptive
  firmware that adjusts SDRAM timing at boot time.

- As reference, the successfully tested read time window
  at 33 Mhz Tclk has a width of 17.5 ns.

- Interesting enough, the successfully tested read window shifts
  towards earlier read times with higher Tclk speeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants