Some strange results

Hello,

I am trying to use cuda-ecm and having strange results, maybe not processed.

### Inside my input.txt file i have :
**0 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497717637
1 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497717049
2 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497722083
3 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497717393
4 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497718659
...
305 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497716427
306 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497715369
307 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497715999
308 18385953704149511432003956482802644489185349949908915625820139932293445782211552617095811061291497713119**

And i want factors of them.

they are 344 bits length.

### Actually when i launch the process i have no alarm and no error :
**2023-09-26 14:01:11.836315 [INFO]	Writing to logfile ./log.txt
2023-09-26 14:01:11.836352 [WARN]	Runtime log level cannot be set lower than compile time setting.
2023-09-26 14:01:11.836389 [WARN]	Continuing with level 3
2023-09-26 14:01:11.836436 [INFO]	Using static seed for RNG.
2023-09-26 14:01:11.836479 [INFO]	Finding all factors
2023-09-26 14:01:11.836544 [INFO]	[Config]
2023-09-26 14:01:11.836574 [INFO]		input: ./input.txt
2023-09-26 14:01:11.836602 [INFO]		output: ./output.txt
2023-09-26 14:01:11.836630 [INFO]	[Server]
2023-09-26 14:01:11.836636 [INFO]		port: 11111
2023-09-26 14:01:11.836641 [INFO]	  [ECM]
2023-09-26 14:01:11.836646 [INFO]		effort_max: 1000
2023-09-26 14:01:11.836672 [INFO]		[Stage 1]
2023-09-26 14:01:11.836679 [INFO]		b1: 50000 (powersmoothness)
2023-09-26 14:01:11.836684 [INFO]		Checking all points for factors
2023-09-26 14:01:11.836712 [INFO]		[Stage 2]
2023-09-26 14:01:11.836742 [INFO]		enabled
2023-09-26 14:01:11.836770 [INFO]		b2: 500000
2023-09-26 14:01:11.836783 [INFO]		window size: 2310
2023-09-26 14:01:11.836820 [INFO]		Checking all points for factors
2023-09-26 14:01:11.836849 [INFO]	  [CUDA]
2023-09-26 14:01:11.836885 [INFO]		n_cuda_streams: 2
2023-09-26 14:01:11.836916 [INFO]		curve_gen: GKL2016_j4
2023-09-26 14:01:11.836954 [INFO]		cuda_threads_per_block: 128
2023-09-26 14:01:11.836985 [INFO]		cuda_blocks: 256
2023-09-26 14:01:11.837120 [INFO]	CUDA Device 0
2023-09-26 14:01:11.837151 [INFO]		Name:	Quadro M4000
2023-09-26 14:01:11.837178 [INFO]		Global Memory:	8589672448 bytes
2023-09-26 14:01:11.837188 [INFO]		Constant Memory:	65536 bytes
2023-09-26 14:01:11.837214 [INFO]		Shared Mem per Block:	49152 bytes
2023-09-26 14:01:11.837242 [INFO]		32bit Registers per Block:	65536
2023-09-26 14:01:11.837250 [INFO]		Max Threads per Block:	1024
2023-09-26 14:01:11.837287 [INFO]		Warpsize:	32 threads
2023-09-26 14:01:11.837315 [INFO]		Multiprocessors:	13
2023-09-26 14:01:11.837637 [INFO]	== Using device 0 ==
2023-09-26 14:01:11.837669 [INFO]	CUDA Configuration
2023-09-26 14:01:11.837676 [INFO]		Concurrent Streams: 2
2023-09-26 14:01:11.837702 [INFO]		Curves per Batch:   32768
2023-09-26 14:01:11.837730 [INFO]		Threads per Block:  128
2023-09-26 14:01:11.837767 [INFO]		Blocks per Stream:  256
2023-09-26 14:01:11.837799 [INFO]	ECM Configuration
2023-09-26 14:01:11.837807 [INFO]		B1 (powersmooth):     50000
2023-09-26 14:01:11.837812 [INFO]		B2:              500000
2023-09-26 14:01:11.837817 [INFO]		Max Effort:      1000
2023-09-26 14:01:11.837822 [INFO]		Curve generator: GKL2016_j4
2023-09-26 14:01:11.837904 [INFO]	CUDA using 128 threads per block.
2023-09-26 14:01:11.837935 [INFO]	CUDA using 256 blocks.
2023-09-26 14:01:11.838024 [WARN]	input thread start
2023-09-26 14:01:11.874378 [INFO]	Initial Task Queue: 309 tasks
2023-09-26 14:01:11.939323 [INFO]	Stage 1 Initialization of device #0
2023-09-26 14:01:12.063646 [INFO]	[Device 0] Stage 2 Initialization...
2023-09-26 14:01:12.212689 [INFO]	[Device 0] Stage 2 Giantstep buffer: 5093523456B free memory, using 7864320B (197 points) per thread)
2023-09-26 14:01:12.224743 [INFO]	Stage 2 init done
2023-09-26 14:01:13.959745 [INFO]	[Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:13.002457 [INFO]	[Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:16.163838 [INFO]	[Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 2204ms (14867 c/s)
2023-09-26 14:01:16.209453 [INFO]	[Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2207ms (14848 c/s)
2023-09-26 14:01:17.826552 [INFO]	[Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1663ms (19708 c/s)
2023-09-26 14:01:17.947187 [INFO]	[Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1738ms (18857 c/s)
2023-09-26 14:01:17.947238 [INFO]	Task Queue: 242 tasks
2023-09-26 14:01:19.576396 [INFO]	[Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:19.730936 [INFO]	[Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:21.507658 [INFO]	[Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 1931ms (16968 c/s)
2023-09-26 14:01:21.756818 [INFO]	[Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2026ms (16175 c/s)
2023-09-26 14:01:23.216437 [INFO]	[Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1709ms (19177 c/s)
2023-09-26 14:01:23.437845 [INFO]	[Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1681ms (19495 c/s)
2023-09-26 14:01:23.437921 [INFO]	Task Queue: 175 tasks
2023-09-26 14:01:24.976318 [INFO]	[Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:25.210125 [INFO]	[Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:27.264052 [INFO]	[Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 2288ms (14324 c/s)
2023-09-26 14:01:27.503159 [INFO]	[Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2293ms (14291 c/s)
2023-09-26 14:01:28.993983 [INFO]	[Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1730ms (18943 c/s)
2023-09-26 14:01:29.237622 [INFO]	[Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1734ms (18893 c/s)
2023-09-26 14:01:29.237674 [INFO]	Task Queue: 108 tasks
2023-09-26 14:01:30.736050 [INFO]	[Thread 1] 32768 tasks in batch for Stage 1
2023-09-26 14:01:30.982581 [INFO]	[Thread 0] 32768 tasks in batch for Stage 1
2023-09-26 14:01:32.848146 [INFO]	[Device 0] [Thread 1] Stage 1 Performance: 32768 curves in 2112ms (15515 c/s)
2023-09-26 14:01:33.025715 [INFO]	[Device 0] [Thread 0] Stage 1 Performance: 32768 curves in 2043ms (16039 c/s)
2023-09-26 14:01:34.542253 [INFO]	[Device 0] [Thread 1] Stage 2 Performance: 32768 curves in 1694ms (19343 c/s)
2023-09-26 14:01:34.724778 [INFO]	[Device 0] [Thread 0] Stage 2 Performance: 32768 curves in 1699ms (19287 c/s)
2023-09-26 14:01:34.724830 [INFO]	Task Queue: 44 tasks
2023-09-26 14:01:35.898014 [INFO]	[Thread 0] 20915 tasks in batch for Stage 1
2023-09-26 14:01:35.898097 [INFO]	[Thread 1] 25941 tasks in batch for Stage 1
2023-09-26 14:01:37.175388 [INFO]	[Device 0] [Thread 0] Stage 1 Performance: 20915 curves in 1277ms (16375 c/s)
2023-09-26 14:01:37.479382 [INFO]	[Device 0] [Thread 1] Stage 1 Performance: 25941 curves in 1581ms (16406 c/s)
2023-09-26 14:01:38.303604 [INFO]	[Device 0] [Thread 0] Stage 2 Performance: 20915 curves in 1128ms (18539 c/s)
2023-09-26 14:01:38.303662 [INFO]	Task Queue: 0 tasks
2023-09-26 14:01:38.844777 [INFO]	[Device 0] [Thread 1] Stage 2 Performance: 25941 curves in 1365ms (19000 c/s)
2023-09-26 14:01:38.844932 [INFO]	[Total] Stage 1&2 Performance: 309000 curves in 26620ms (11608 c/s)
2023-09-26 14:01:38.844976 [INFO]	Final Task Queue: 0 tasks**


### but output.txt looks like : 
**308 1
307 1
0 1
305 1
304 1
306 1
301 1
...
8 1
10 1
7 1
6 1
5 1
4 1
2 1
3 1
1 1
DONE**

### Config.ini :
**; Example config file for co-ecm
[general]

; server or file
mode = file

; Logfile location
logfile = ./log.txt

; Output file of abandoned, i.e. unsolved tasks.
; Format is the same as the output format, without listing factors
;
; Example line:
; 44800523911798220433379600867; # effort 112
output_abandoned = ./bw192easy-stage2.abandoned.txt

; Log level
;
;   1: "VERBOSE",
;   2: "DEBUG",
;   3: "INFO",
;   4: "WARNING",
;   5: "ERROR",
;   6: "FATAL",
;   7: "NONE"
; Default is set at compile time.
loglevel = 1    

; Use a random seed for the random number generator used to generate points and
; curves. If set to 'false', each run of the program will behave the same
; provided the same input data.
; Default: true
random = false

[server]
port = 11111

[file]
; Input file.
; The input file should contain a single number to be factored per line. Lines
; starting with anything but a digit are skipped.
;
; Example line:
; 44800523911798220433379600867
input = ./input.txt


; Output file.
; Each fully factored input number is appended to the output on its own line in
; the format 
; (input number);(factor),(factor),(factor), # effort: (number of curves)
;
; Example line:
; 44800523911798220433379600867;224536506062699,199524454608233, # effort: 12
output = ./output.txt


[cuda]

; Number of concurrent cuda streams to issue to GPU
; Default: 2
streams = 2

; Number of threads per block for cuda kernel launches.
; Set to auto to determine setting for maximum parallel resident blocks per SM at runtime.
; Note: The settings determined by 'auto' are not always automatically the optimal setting for maximum throughput.
; Default: auto
threads_per_block = 128

; Constant memory is used for (smaller) scalars during point multiplication.
; When the scalar is too large to fit into constant memory or this option is set
; to 'false', global device memory is used.
; Default: true
use_const_memory = false

[ecm]
; Redo ECM until numbers are fully factored.
; If set to false, only the first factor is returned.
; Default: false
find_all_factors = true

; Set the computation of the scalar s for point multiplication. With
; 'powersmooth' set to 'true', then s = lcm(2, ..., b1). If set to false,
; s = primorial(2, ..., b1), i.e. the product of all primes less than or equal
; to b1.
; Default: true
powersmooth = true

; Bound b1 for stage 1 of ecm.
b1 = 50000

; Bound b2 for stage 2 of ecm.
b2 = 500000

; Maximum effort per input number.
; With each curve, the already spent effort is incremented. Thus, with effort
; set to 100, ecm stage1 (and stage2) will be executed on 100 curves per input
; number.
; Default: 10
effort = 1000

; Set the curve generator function. 
; Use 2 under normal circumstances.
;   0: "Naive"
;   1: "GKL2016_j1"
;   2: "GKL2016_j4"
; Default: 2
curve_gen = 2   

; Use only points for finding factors that are off curve.
; After point multiplication, use all resulting points to find factors. If set
; to 'false' coordinates of points will be checked that do not fulfill the curve
; equation.
; Settings for stage1 and stage2 respectively.
; Default: true
stage1.check_all = true
stage2.check_all = true

; Enable/Disable stage 2.
; If set to 'false', only stage 1 of ECM is performed.
; Default: true
stage2.enabled = true

; Set the window size for stage 2
; Default: 2310
;stage2.window_size = 2310**

### /ecmongpu-master/CMakeLists.txt : 
**cmake_minimum_required(VERSION 0.7)
project(co-ecm C CUDA)

set(VERSION "v0.8")
set(CMAKE_BUILD_TYPE Release)

set(BUILD_BENCHMARKS 1)


# Options and Settings

# Log level: VERBOSE, DEBUG, INFO, WARNING, ERROR, FATAL
set(LOG_LEVEL INFO)

# Multi precision limbs, use 32 or 64 bit datatype
set(LIMB_BITS 32)

# Bits of basic multi precision datatype. Has to be 32 bit more than the modulus to be factored
if(DEFINED ENV{BITWIDTH})
	set(BITWIDTH $ENV{BITWIDTH})
else()
	set(BITWIDTH 352)
endif()
message("Building for ${BITWIDTH}-bit moduli")

# Default CUDA threads per block.
set(BLOCK_SIZE 128)

# Curves per single batch.
set(BATCH_JOB_SIZE 32768)

# Window size for w-NAF.
if(DEFINED ENV{WINDOW_SIZE})
  set(NAF_WINDOW_SIZE $ENV{WINDOW_SIZE})
else()
  set(NAF_WINDOW_SIZE 4)
endif()
message("Building with window size w=${NAF_WINDOW_SIZE}")

# Default allocated number of NAF digits for ECM stage 2.
# If needed for large Stage 2 bounds. You will be asked to increase this value and recompile.
set(NAF_STAGE2_DEFAULT_DIGITS 20)

# Use optimized precomputation
if(NOT DEFINED ENV{DISABLE_OPTIMIZED_PRECOMP})
  set(OPTIMIZE_PRECOMP 1)
endif()
if(NOT DEFINED OPTIMIZE_PRECOMP)
  message("Optimized point representation disabled")
endif()

# Choose a Montgomery product algorithm
if(DEFINED ENV{MON_PROD})
  if("$ENV{MON_PROD}" STREQUAL "CIOS")
    set(MON_PROD_CIOS 1)
  endif()
  if("$ENV{MON_PROD}" STREQUAL "CIOS_XMAD")
    set(MON_PROD_CIOS_XMAD 1)
  endif()
  if("$ENV{MON_PROD}" STREQUAL "FIPS")
    set(MON_PROD_FIPS 1)
  endif()
  if("$ENV{MON_PROD}" STREQUAL "FIOS")
    set(MON_PROD_FIOS 1)
  endif()
else ()
  set(MON_PROD_FIPS 1)
endif()


set(COORDINATES_EXTENDED 1)
#set(COORDINATES_INVERTED 1)

# Set the maximum number of registers during compilation. 
# Low values result in excessive spilling to (slow local, ie global ) memory.
#set(GPU_MAX_REG 64)

# Set CUDA architectures to generate binary code for

if(DEFINED ENV{GPU_ARCH})
    set(GPU_ARCHITECTURE $ENV{GPU_ARCH})
    message("Building for CUDA architecture ${GPU_ARCHITECTURE}")
else()
  execute_process(
  	COMMAND bash -c "${CUDA_TOOLKIT_ROOT_DIR}/extras/demo_suite/deviceQuery | grep 'CUDA Capability' | sed -rn ':a;N;$!ba;s/.*:\\s*(.+)\\.(.+).*/sm_\\1\\2/p'"
    OUTPUT_VARIABLE GPU_ARCHITECTURE
    ERROR_QUIET
    OUTPUT_STRIP_TRAILING_WHITESPACE
    )
  if("${GPU_ARCHITECTURE}" STREQUAL "")
    set(GPU_ARCHITECTURE "sm_60;sm_61;sm_70;sm_75")
    message("Could not detect CUDA device architecture, using ${GPU_ARCHITECTURE}")
  else()
    message("Detected CUDA architecture ${GPU_ARCHITECTURE}")
  endif()
endif()




# Build setup
# Do not edit

list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")

find_package(GMP REQUIRED)


# Set CUDA compiler flags
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -lineinfo -Xptxas=-v -lineinfo --keep --compiler-options='-Wall  -Wno-unknown-pragmas'")

if(NOT "${GPU_MAX_REG}" STREQUAL "")
  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -maxrregcount=${GPU_MAX_REG}")
endif()

foreach(ARCH IN LISTS GPU_ARCHITECTURE)
    string(REPLACE "sm_" "compute_" COMPUTE ${ARCH})
    set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --generate-code arch=${COMPUTE},code=${ARCH}")
endforeach(ARCH)


include(CTest)

# Generate version.h
include(${CMAKE_MODULE_PATH}/version.cmake)

# Generate build_config.h
configure_file(
        include/build_config.h.in
        ${CMAKE_CURRENT_BINARY_DIR}/generated/build_config.h
)

set(CMAKE_C_STANDARD 99)


# Concatenate all .cu files into a single kernel file for nvcc to
# work around the nvcc limitations of slow non-seperable builds.
# (nvcc does not know link-time optimization)
function(add_cuda_executable TARGET)
    set(CURRENT_C_SOURCES ${ARGN})
    set(CURRENT_CUDA_SOURCES ${ARGN})
    list(FILTER CURRENT_C_SOURCES EXCLUDE REGEX ".*\\.cu")
    list(FILTER CURRENT_CUDA_SOURCES INCLUDE REGEX ".*\\.cu")
    file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/generated)
    add_custom_command(
            OUTPUT generated/${TARGET}-cudakernel.cu
            COMMAND cat ${CURRENT_CUDA_SOURCES} > ${CMAKE_CURRENT_BINARY_DIR}/generated/${TARGET}-cudakernel.cu
            WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}"
            DEPENDS ${CURRENT_CUDA_SOURCES}
    )
    add_executable(${TARGET} ${CURRENT_C_SOURCES} generated/${TARGET}-cudakernel.cu)
endfunction()


# Set include directories
include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories(${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
include_directories(${CMAKE_CURRENT_BINARY_DIR}/generated)

find_package (Threads)
link_libraries(gmp)
link_libraries(${CMAKE_THREAD_LIBS_INIT})


# Set common source files
set(COMMON_CUDA_SOURCES
        ${CMAKE_CURRENT_SOURCE_DIR}/src/mp/mp.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/mp/mp_montgomery.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/naf.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/tw_ed_common.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/ecm.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/factor_task.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/batch.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/stage1.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/stage2.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/config/config.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/config/handler.cu
        )

# Set coordinate specific ecc and ecm implementations
if(COORDINATES_EXTENDED)
  set(COMMON_CUDA_SOURCES
        ${COMMON_CUDA_SOURCES}
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/tw_ed_extended.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/tw_ed_extended.cu
        )
elseif(COORDINATES_INVERTED)
  set(COMMON_CUDA_SOURCES
        ${COMMON_CUDA_SOURCES}
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecc/tw_ed_inverted.cu
        ${CMAKE_CURRENT_SOURCE_DIR}/src/ecm/tw_ed_inverted.cu
        )
endif()

set(COMMON_C_SOURCES
        ${CMAKE_CURRENT_SOURCE_DIR}/src/gmp_conv/gmp_conversion.c
        ${CMAKE_CURRENT_SOURCE_DIR}/src/config/ini.c
        ${CMAKE_CURRENT_SOURCE_DIR}/src/log.c
        ${CMAKE_CURRENT_SOURCE_DIR}/src/input/file.c
        ${CMAKE_CURRENT_SOURCE_DIR}/src/input/tcp.c
        ${CMAKE_CURRENT_SOURCE_DIR}/src/input/parser.c
        )

# Set output directory
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)

add_subdirectory(src)
add_subdirectory(tests)
add_subdirectory(bench)
add_subdirectory(resource)**


each time i change CMakelist file i do cmake and make to generate all the stuff.

i am not familiar with mathematics... so i probably made an importante mistake inside my settings.

Thank you



Some strange results #1

Description

Inside my input.txt file i have :

Actually when i launch the process i have no alarm and no error :

but output.txt looks like :

Config.ini :

/ecmongpu-master/CMakeLists.txt :

Options and Settings

Log level: VERBOSE, DEBUG, INFO, WARNING, ERROR, FATAL

Multi precision limbs, use 32 or 64 bit datatype

Bits of basic multi precision datatype. Has to be 32 bit more than the modulus to be factored

Default CUDA threads per block.

Curves per single batch.

Window size for w-NAF.

Default allocated number of NAF digits for ECM stage 2.

If needed for large Stage 2 bounds. You will be asked to increase this value and recompile.

Use optimized precomputation

Choose a Montgomery product algorithm

Set the maximum number of registers during compilation.

Low values result in excessive spilling to (slow local, ie global ) memory.

Set CUDA architectures to generate binary code for

Build setup

Do not edit

Set CUDA compiler flags

Generate version.h

Generate build_config.h

Concatenate all .cu files into a single kernel file for nvcc to

work around the nvcc limitations of slow non-seperable builds.

(nvcc does not know link-time optimization)

Set include directories

Set common source files

Set coordinate specific ecc and ecm implementations

Set output directory

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions