-
Notifications
You must be signed in to change notification settings - Fork 9
Xcore ai support #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Xcore ai support #36
Conversation
- tested with XTC 15.3.1
on XP-SKC-X200 sliceboard (XS-2 CPU XE216-512-FB236)
and XA-SK-SDRAM slice (64Mb IS42S16400F)
- .gitignore taken from other XMOS projects that already support
XCommon CMake (mostly a superset of the previous version)
- to get it running, fixed test/sdram_multi_client
+ sdram_server called with client_count=7 according
the the actual number of clients
+ run sdram_server in fast mode, otherwise there would be
a short moment when enough clients (running in fast mode)
are busy at the same time, so the sdram_server running in
slow mode doesn't get enough cycles, resulting in errors
- cosmetic change in sdram_bench to actually use BUF_WORDS,
so that all tests run with the same buffer size, which is not
a multiple of the page size (which was probably the intention)
- as used to calculate the theoretical SDRAM read timings for xcore-ai - with documentation of the signals and latencies in the comments
- for 600 MHz clock instead of the 500 MHz default of XS1/XS2 - tested with an XU316 board with 64Mbit Winbond W9864G6KH-6 which is equivalent to an ISSI IS42S16400D-7 - depending on the clock divider, the READ/WRITE_SETUP_LATENCY could possibly be further optimized (reduced) for XS3 compared to XS2, as the thread cycles of XS3 are shorter by 17%, but this is relevant only for small buffer read/write commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds xcore-ai (XS3) support to lib_sdram, extending compatibility beyond XS2 (xCORE200) architecture. The changes include platform-specific timing parameter selection for the faster 600MHz XS3 core, a new timing calculation and test tool to validate SDRAM data input delays, and comprehensive updates to all examples and tests to support both XS2 and XS3 platforms.
Key changes:
- New
sdram_server_with_delays()API function for explicit timing control - Platform-specific default timing delays (600MHz for XS3, 500MHz for XS2)
- New sdram_timing tool for calculating and validating read timing windows
- Build system migration to CMake with target board selection
Reviewed Changes
Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lib_sdram/src/server.xc | Refactored timing delay selection into select_default_delays(), added sdram_server_with_delays() API with explicit timing parameters |
| lib_sdram/src/io.S | Updated assembly to support both XS2A and XS3A architectures |
| lib_sdram/api/sdram.h | Added API documentation for new sdram_server_with_delays() function |
| tools/sdram_timing/* | New timing calculation and testing tool with comprehensive documentation |
| tests/sdram_testbench/src/sdram_testbench.xc | Added XS3 support, improved error reporting with counters, and verbose print helper |
| tests/sdram_multi_client_test/src/sdram_multi_client_test.xc | Added XS3 port definitions and platform-specific clock dividers |
| tests/sdram_benchmark/src/sdram_bench.xc | Added XS3 support and JTAG IO print mode handling |
| examples/sdram_demo/src/sdram_demo.xc | Added XS3 platform support with appropriate clock dividers |
| CMakeLists.txt (multiple) | New CMake build configuration files for all examples, tests, and tools |
| *.xn (multiple) | New board definition files for xcore-ai SDRAM test board |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| if (VERBOSE_MSG) | ||
| printf("4 threaded test suite completed\n"); | ||
| verbose_print("4 threaded test suite completed: %d\n"); |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format string in this verbose_print call is missing arguments. The function expects "4 threaded test suite completed: %d\n" which requires a success argument, but none is provided. This will cause undefined behavior or a compilation error.
| verbose_print("4 threaded test suite completed: %d\n"); | |
| printf("4 threaded test suite completed: %d\n", success); |
| static void select_default_delays( | ||
| const unsigned clock_divider, | ||
| unsigned& read_delay_whole_clocks, | ||
| unsigned& sample_delay, | ||
| unsigned& pad_delay) { |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function select_default_delays is missing documentation. Given that it plays a critical role in selecting timing parameters based on clock divider, it should have a comment explaining its purpose, parameters, and expected behavior.
|
Hi Ralf, thanks for the PR! We'll get someone assigned to look though the changes. HW offer would be great. We have something in the pipe but it's behind some other items at the moment. I sent you an email. |
- XS2 tests were done with XP-SKC-X200 and SDRAM slice containing IS42S16400F-7, to check for regressions - XS3 tests were done with a test board equipped with XU316-1024-FB265-C24 CPU and a 64 Mbit SDRAM Winbond W9864G6KH-6 (which is equivalent to IS42S16400D-7) - the XS3 test board does not support xscope, so all tests were adapted to optinally work with JTAG I/O, - to run tests with JTAG I/O, edit the respective config.xscope files to turn xscope off, set the #define JTAG_IO_PRINT (if any) to 1, and use xrun --io instead of xrun --xscope
- first of all, to avoid confusion - this solves the potential issue that cas_latency is declared const static in the header and used as mutable variable in the implementation - it will also be useful for later refactorings - cas_latency parameter was not used in the write function - this is an equivalent transform
- the delay selection function does not configure the actual delays, that is still done by sdram_init - this means select_delays could actually be replaced by a table, indexed by the clock divider (maybe the compiler is doing just that) - actually two tables, one for XS3 and one for XS1/XS2 - sdram_init is not concerned anymore about read_delay_whole_clocks
- by introducing a new function, sdram_server_with_delays
- it gets a none-const non-static clock divider
and the three parameters which were before derived
from this clock divider
- the original sdram_server is functionally unchanged
and implemented in terms of the new function,
using the now called select_default_delays function
- clock_divider is not used in time-critical code,
all other const-static parameters are unchanged
and therefore no new performance issues are created
- the new sdram_server_with_delays function has
following use cases
+ the SDRAM read timing depends very much on the users
situation - SDRAM IC selection, PCB line capacities,
XMOS CPU pin selection, pin driver current selection,
temperature ranges, supply voltage levels - therefore
it is an advantage if the user can specify these
parameters in the application without a need to modify
the lib_sdram library
+ the function allows to write a tool that would test
various settings to see which timing ranges actually
do work, to verify the theoretical considerations
+ it could make it possibly to reach higher SDRAM speeds
using an adaptive timing, which means the working
timing range is measured before the actual SDRAM
use or in idle phases, by trial-and-error with
varying settings
- that tests read delay parameter combinations for clock dividers from 4 to 12 and prints out which work and which not - does a single write/read test with fixed and walking 1 pattern and whole memory test with random pattern
- this allows to compare theoretical and practical results easily - provide calculation and actual test results compared to default values chosen in lib_sdram/src/server.xc
- explicitely limit thread speed of sdram_server to FCore/8, e.g. 63 MIPS at 500 MHz - this is tested to be still fast enough for all scenarios in the test, including 100MHz Tclk at 800MHz Tcore - document the respective code
7f92ea8 to
077134a
Compare
|
Sorry for force pushing, I've taken the useful Copilot suggestions into account. |
- The calculation doesn't find a solution for SDRAM clock speeds higher than 57.14 MHz. The hardware actually works in a read time window of 5ns width at 100MHz Fclk. This is definitely too small for the variation of production units, but could possibly work with an adaptive firmware that adjusts SDRAM timing at boot time. - As reference, the successfully tested read time window at 33 Mhz Tclk has a width of 17.5 ns. - Interesting enough, the successfully tested read window shifts towards earlier read times with higher Tclk speeds.
077134a to
35375b1
Compare
As discussed in issue #35, here is the XTC 15.3.x and xcore-ai update for lib_sdram.
With timing calculation and test tool. Tested with XS2 and XS3 hardware.
I've ordered the commits from rather mandatory to rather optional.
Let me know how far you want to go. The sdram_timing tool in the last commit
seems funny as it allows to directly compare delay calculations and test results.
I could send some XS3 SDRAM testing hardware to XMOS if you like.
Yours,
Ralf