-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hello,
I am trying to use cuda-ecm and having strange results, maybe not processed.
Inside my input.txt file i have :
0 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497717637
1 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497717049
2 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497722083
3 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497717393
4 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497718659
...
305 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497716427
306 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497715369
307 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497715999
308 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497713119
And i want factors of them.
they are 344 bits length.
Actually when i launch the process i have no alarm and no error :
2023-09-26 14:01:11.836315 [INFO] Writing to logfile ./log.txt
2023-09-26 14:01:11.836352 [WARN] Runtime log level cannot be set lower than compile time setting.
2023-09-26 14:01:11.836389 [WARN] Continuing with level 3
2023-09-26 14:01:11.836436 [INFO] Using static seed for RNG.
2023-09-26 14:01:11.836479 [INFO] Finding all factors
2023-09-26 14:01:11.836544 [INFO] [Config]
2023-09-26 14:01:11.836574 [INFO] input: ./input.txt
2023-09-26 14:01:11.836602 [INFO] output: ./output.txt
2023-09-26 14:01:11.836630 [INFO] [Server]
2023-09-26 14:01:11.836636 [INFO] port: 11111
2023-09-26 14:01:11.836641 [INFO] [ECM]
2023-09-26 14:01:11.836646 [INFO] effort_max: 1000
2023-09-26 14:01:11.836672 [INFO] [Stage 1]
2023-09-26 14:01:11.836679 [INFO] b1: 50000 (powersmoothness)
2023-09-26 14:01:11.836684 [INFO] Checking all points for factors
2023-09-26 14:01:11.836712 [INFO] [Stage 2]
2023-09-26 14:01:11.836742 [INFO] enabled
2023-09-26 14:01:11.836770 [INFO] b2: 500000
2023-09-26 14:01:11.836783 [INFO] window size: 2310
2023-09-26 14:01:11.836820 [INFO] Checking all points for factors
2023-09-26 14:01:11.836849 [INFO] [CUDA]
2023-09-26 14:01:11.836885 [INFO] n_cuda_streams: 2
2023-09-26 14:01:11.836916 [INFO] curve_gen: GKL2016_j4
2023-09-26 14:01:11.836954 [INFO] cuda_threads_per_block: 128
2023-09-26 14:01:11.836985 [INFO] cuda_blocks: 256
2023-09-26 14:01:11.837120 [INFO] CUDA Device 0
2023-09-26 14:01:11.837151 [INFO] Name: Quadro M4000
2023-09-26 14:01:11.837178 [INFO] Global Memory: 8589672448 bytes
2023-09-26 14:01:11.837188 [INFO] Constant Memory: 65536 bytes
2023-09-26 14:01:11.837214 [INFO] Shared Mem per Block: 49152 bytes
2023-09-26 14:01:11.837242 [INFO] 32bit Registers per Block: 65536
2023-09-26 14:01:11.837250 [INFO] Max Threads per Block: 1024
2023-09-26 14:01:11.837287 [INFO] Warpsize: 32 threads
2023-09-26 14:01:11.837315 [INFO] Multiprocessors: 13
2023-09-26 14:01:11.837637 [INFO] == Using device 0 ==
2023-09-26 14:01:11.837669 [INFO] CUDA Configuration
2023-09-26 14:01:11.837676 [INFO] Concurrent Streams: 2
2023-09-26 14:01:11.837702 [INFO] Curves per Batch: 32768
2023-09-26 14:01:11.837730 [INFO] Threads per Block: 128
2023-09-26 14:01:11.837767 [INFO] Blocks per Stream: 256
2023-09-26 14:01:11.837799 [INFO] ECM Configuration
2023-09-26 14:01:11.837807 [INFO] B1 (powersmooth): 50000
2023-09-26 14:01:11.837812 [INFO] B2: 500000
2023-09-26 14:01:11.837817 [INFO] Max Effort: 1000
2023-09-26 14:01:11.837822 [INFO] Curve generator: GKL2016_j4
2023-09-26 14:01:11.837904 [INFO] CUDA using 128 threads per block.
2023-09-26 14:01:11.837935 [INFO] CUDA using 256 blocks.
2023-09-26 14:01:11.838024 [WARN] input thread start
2023-09-26 14:01:11.874378 [INFO] Initial Task Queue: 309 tasks
2023-09-26 14:01:11.939323 [INFO] Stage 1 Initialization of device #0
2023-09-26 14:01:12.063646 [INFO] [Device 0] Stage 2 Initialization...
2023-09-26 14:01:12.212689 [INFO] [Device 0] Stage 2 Giantstep buffer: 5093523456B free memory, using 7864320B (197 points) per thread)
2023-09-26 14:01:12.224743 [INFO] Stage 2 init done
2023-09-26 14:01:13.959745 [INFO] [Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:13.002457 [INFO] [Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:16.163838 [INFO] [Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 2204ms (14867 c/s)
2023-09-26 14:01:16.209453 [INFO] [Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2207ms (14848 c/s)
2023-09-26 14:01:17.826552 [INFO] [Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1663ms (19708 c/s)
2023-09-26 14:01:17.947187 [INFO] [Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1738ms (18857 c/s)
2023-09-26 14:01:17.947238 [INFO] Task Queue: 242 tasks
2023-09-26 14:01:19.576396 [INFO] [Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:19.730936 [INFO] [Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:21.507658 [INFO] [Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 1931ms (16968 c/s)
2023-09-26 14:01:21.756818 [INFO] [Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2026ms (16175 c/s)
2023-09-26 14:01:23.216437 [INFO] [Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1709ms (19177 c/s)
2023-09-26 14:01:23.437845 [INFO] [Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1681ms (19495 c/s)
2023-09-26 14:01:23.437921 [INFO] Task Queue: 175 tasks
2023-09-26 14:01:24.976318 [INFO] [Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:25.210125 [INFO] [Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:27.264052 [INFO] [Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 2288ms (14324 c/s)
2023-09-26 14:01:27.503159 [INFO] [Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2293ms (14291 c/s)
2023-09-26 14:01:28.993983 [INFO] [Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1730ms (18943 c/s)
2023-09-26 14:01:29.237622 [INFO] [Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1734ms (18893 c/s)
2023-09-26 14:01:29.237674 [INFO] Task Queue: 108 tasks
2023-09-26 14:01:30.736050 [INFO] [Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:30.982581 [INFO] [Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:32.848146 [INFO] [Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 2112ms (15515 c/s)
2023-09-26 14:01:33.025715 [INFO] [Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2043ms (16039 c/s)
2023-09-26 14:01:34.542253 [INFO] [Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1694ms (19343 c/s)
2023-09-26 14:01:34.724778 [INFO] [Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1699ms (19287 c/s)
2023-09-26 14:01:34.724830 [INFO] Task Queue: 44 tasks
2023-09-26 14:01:35.898014 [INFO] [Thread 0] 20915 tasks in batch for Stage 1
2023-09-26 14:01:35.898097 [INFO] [Thread 1] 25941 tasks in batch for Stage 1
2023-09-26 14:01:37.175388 [INFO] [Device 0] [Thread 0] Stage 1 Performance: 20915 curves in 1277ms (16375 c/s)
2023-09-26 14:01:37.479382 [INFO] [Device 0] [Thread 1] Stage 1 Performance: 25941 curves in 1581ms (16406 c/s)
2023-09-26 14:01:38.303604 [INFO] [Device 0] [Thread 0] Stage 2 Performance: 20915 curves in 1128ms (18539 c/s)
2023-09-26 14:01:38.303662 [INFO] Task Queue: 0 tasks
2023-09-26 14:01:38.844777 [INFO] [Device 0] [Thread 1] Stage 2 Performance: 25941 curves in 1365ms (19000 c/s)
2023-09-26 14:01:38.844932 [INFO] [Total] Stage 1&2 Performance: 309000 curves in 26620ms (11608 c/s)
2023-09-26 14:01:38.844976 [INFO] Final Task Queue: 0 tasks
but output.txt looks like :
308 1
307 1
0 1
305 1
304 1
306 1
301 1
...
8 1
10 1
7 1
6 1
5 1
4 1
2 1
3 1
1 1
DONE
Config.ini :
**; Example config file for co-ecm
[general]
; server or file
mode = file
; Logfile location
logfile = ./log.txt
; Output file of abandoned, i.e. unsolved tasks.
; Format is the same as the output format, without listing factors
;
; Example line:
; 44800523911798220433379600867; # effort 112
output_abandoned = ./bw192easy-stage2.abandoned.txt
; Log level
;
; 1: "VERBOSE",
; 2: "DEBUG",
; 3: "INFO",
; 4: "WARNING",
; 5: "ERROR",
; 6: "FATAL",
; 7: "NONE"
; Default is set at compile time.
loglevel = 1
; Use a random seed for the random number generator used to generate points and
; curves. If set to 'false', each run of the program will behave the same
; provided the same input data.
; Default: true
random = false
[server]
port = 11111
[file]
; Input file.
; The input file should contain a single number to be factored per line. Lines
; starting with anything but a digit are skipped.
;
; Example line:
; 44800523911798220433379600867
input = ./input.txt
; Output file.
; Each fully factored input number is appended to the output on its own line in
; the format
; (input number);(factor),(factor),(factor), # effort: (number of curves)
;
; Example line:
; 44800523911798220433379600867;224536506062699,199524454608233, # effort: 12
output = ./output.txt
[cuda]
; Number of concurrent cuda streams to issue to GPU
; Default: 2
streams = 2
; Number of threads per block for cuda kernel launches.
; Set to auto to determine setting for maximum parallel resident blocks per SM at runtime.
; Note: The settings determined by 'auto' are not always automatically the optimal setting for maximum throughput.
; Default: auto
threads_per_block = 128
; Constant memory is used for (smaller) scalars during point multiplication.
; When the scalar is too large to fit into constant memory or this option is set
; to 'false', global device memory is used.
; Default: true
use_const_memory = false
[ecm]
; Redo ECM until numbers are fully factored.
; If set to false, only the first factor is returned.
; Default: false
find_all_factors = true
; Set the computation of the scalar s for point multiplication. With
; 'powersmooth' set to 'true', then s = lcm(2, ..., b1). If set to false,
; s = primorial(2, ..., b1), i.e. the product of all primes less than or equal
; to b1.
; Default: true
powersmooth = true
; Bound b1 for stage 1 of ecm.
b1 = 50000
; Bound b2 for stage 2 of ecm.
b2 = 500000
; Maximum effort per input number.
; With each curve, the already spent effort is incremented. Thus, with effort
; set to 100, ecm stage1 (and stage2) will be executed on 100 curves per input
; number.
; Default: 10
effort = 1000
; Set the curve generator function.
; Use 2 under normal circumstances.
; 0: "Naive"
; 1: "GKL2016_j1"
; 2: "GKL2016_j4"
; Default: 2
curve_gen = 2
; Use only points for finding factors that are off curve.
; After point multiplication, use all resulting points to find factors. If set
; to 'false' coordinates of points will be checked that do not fulfill the curve
; equation.
; Settings for stage1 and stage2 respectively.
; Default: true
stage1.check_all = true
stage2.check_all = true
; Enable/Disable stage 2.
; If set to 'false', only stage 1 of ECM is performed.
; Default: true
stage2.enabled = true
; Set the window size for stage 2
; Default: 2310
;stage2.window_size = 2310**
/ecmongpu-master/CMakeLists.txt :
**cmake_minimum_required(VERSION 0.7)
project(co-ecm C CUDA)
set(VERSION "v0.8")
set(CMAKE_BUILD_TYPE Release)
set(BUILD_BENCHMARKS 1)
Options and Settings
Log level: VERBOSE, DEBUG, INFO, WARNING, ERROR, FATAL
set(LOG_LEVEL INFO)
Multi precision limbs, use 32 or 64 bit datatype
set(LIMB_BITS 32)
Bits of basic multi precision datatype. Has to be 32 bit more than the modulus to be factored
if(DEFINED ENV{BITWIDTH})
set(BITWIDTH $ENV{BITWIDTH})
else()
set(BITWIDTH 352)
endif()
message("Building for ${BITWIDTH}-bit moduli")
Default CUDA threads per block.
set(BLOCK_SIZE 128)
Curves per single batch.
set(BATCH_JOB_SIZE 32768)
Window size for w-NAF.
if(DEFINED ENV{WINDOW_SIZE})
set(NAF_WINDOW_SIZE $ENV{WINDOW_SIZE})
else()
set(NAF_WINDOW_SIZE 4)
endif()
message("Building with window size w=${NAF_WINDOW_SIZE}")
Default allocated number of NAF digits for ECM stage 2.
If needed for large Stage 2 bounds. You will be asked to increase this value and recompile.
set(NAF_STAGE2_DEFAULT_DIGITS 20)
Use optimized precomputation
if(NOT DEFINED ENV{DISABLE_OPTIMIZED_PRECOMP})
set(OPTIMIZE_PRECOMP 1)
endif()
if(NOT DEFINED OPTIMIZE_PRECOMP)
message("Optimized point representation disabled")
endif()
Choose a Montgomery product algorithm
if(DEFINED ENV{MON_PROD})
if("$ENV{MON_PROD}" STREQUAL "CIOS")
set(MON_PROD_CIOS 1)
endif()
if("$ENV{MON_PROD}" STREQUAL "CIOS_XMAD")
set(MON_PROD_CIOS_XMAD 1)
endif()
if("$ENV{MON_PROD}" STREQUAL "FIPS")
set(MON_PROD_FIPS 1)
endif()
if("$ENV{MON_PROD}" STREQUAL "FIOS")
set(MON_PROD_FIOS 1)
endif()
else ()
set(MON_PROD_FIPS 1)
endif()
set(COORDINATES_EXTENDED 1)
#set(COORDINATES_INVERTED 1)
Set the maximum number of registers during compilation.
Low values result in excessive spilling to (slow local, ie global ) memory.
#set(GPU_MAX_REG 64)
Set CUDA architectures to generate binary code for
if(DEFINED ENV{GPU_ARCH})
set(GPU_ARCHITECTURE $ENV{GPU_ARCH})
message("Building for CUDA architecture ${GPU_ARCHITECTURE}")
else()
execute_process(
COMMAND bash -c "${CUDA_TOOLKIT_ROOT_DIR}/extras/demo_suite/deviceQuery | grep 'CUDA Capability' | sed -rn ':a;N;$!ba;s/.:\s(.+)\.(.+).*/sm_\1\2/p'"
OUTPUT_VARIABLE GPU_ARCHITECTURE
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE
)
if("${GPU_ARCHITECTURE}" STREQUAL "")
set(GPU_ARCHITECTURE "sm_60;sm_61;sm_70;sm_75")
message("Could not detect CUDA device architecture, using ${GPU_ARCHITECTURE}")
else()
message("Detected CUDA architecture ${GPU_ARCHITECTURE}")
endif()
endif()
Build setup
Do not edit
list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")
find_package(GMP REQUIRED)
Set CUDA compiler flags
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -lineinfo -Xptxas=-v -lineinfo --keep --compiler-options='-Wall -Wno-unknown-pragmas'")
if(NOT "${GPU_MAX_REG}" STREQUAL "")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -maxrregcount=${GPU_MAX_REG}")
endif()
foreach(ARCH IN LISTS GPU_ARCHITECTURE)
string(REPLACE "sm_" "compute_" COMPUTE ${ARCH})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --generate-code arch=${COMPUTE},code=${ARCH}")
endforeach(ARCH)
include(CTest)
Generate version.h
include(${CMAKE_MODULE_PATH}/version.cmake)
Generate build_config.h
configure_file(
include/build_config.h.in
${CMAKE_CURRENT_BINARY_DIR}/generated/build_config.h
)
set(CMAKE_C_STANDARD 99)
Concatenate all .cu files into a single kernel file for nvcc to
work around the nvcc limitations of slow non-seperable builds.
(nvcc does not know link-time optimization)
function(add_cuda_executable TARGET)
set(CURRENT_C_SOURCES ${ARGN})
set(CURRENT_CUDA_SOURCES ${ARGN})
list(FILTER CURRENT_C_SOURCES EXCLUDE REGEX ".\.cu")
list(FILTER CURRENT_CUDA_SOURCES INCLUDE REGEX ".\.cu")
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/generated)
add_custom_command(
OUTPUT generated/${TARGET}-cudakernel.cu
COMMAND cat ${CURRENT_CUDA_SOURCES} > ${CMAKE_CURRENT_BINARY_DIR}/generated/${TARGET}-cudakernel.cu
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}"
DEPENDS ${CURRENT_CUDA_SOURCES}
)
add_executable(${TARGET} ${CURRENT_C_SOURCES} generated/${TARGET}-cudakernel.cu)
endfunction()
Set include directories
include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories(${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
include_directories(${CMAKE_CURRENT_BINARY_DIR}/generated)
find_package (Threads)
link_libraries(gmp)
link_libraries(${CMAKE_THREAD_LIBS_INIT})
Set common source files
set(COMMON_CUDA_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/src/mp/mp.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/mp/mp_montgomery.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/naf.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/tw_ed_common.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/ecm.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/factor_task.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/batch.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/stage1.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/stage2.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/config/config.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/config/handler.cu
)
Set coordinate specific ecc and ecm implementations
if(COORDINATES_EXTENDED)
set(COMMON_CUDA_SOURCES
${COMMON_CUDA_SOURCES}
${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/tw_ed_extended.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/tw_ed_extended.cu
)
elseif(COORDINATES_INVERTED)
set(COMMON_CUDA_SOURCES
${COMMON_CUDA_SOURCES}
${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/tw_ed_inverted.cu
${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/tw_ed_inverted.cu
)
endif()
set(COMMON_C_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/src/gmp_conv/gmp_conversion.c
${CMAKE_CURRENT_SOURCE_DIR}/src/config/ini.c
${CMAKE_CURRENT_SOURCE_DIR}/src/log.c
${CMAKE_CURRENT_SOURCE_DIR}/src/input/file.c
${CMAKE_CURRENT_SOURCE_DIR}/src/input/tcp.c
${CMAKE_CURRENT_SOURCE_DIR}/src/input/parser.c
)
Set output directory
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
add_subdirectory(src)
add_subdirectory(tests)
add_subdirectory(bench)
add_subdirectory(resource)**
each time i change CMakelist file i do cmake and make to generate all the stuff.
i am not familiar with mathematics... so i probably made an importante mistake inside my settings.
Thank you