Skip to content

close #19#23

Merged
freelw merged 33 commits intomainfrom
wangli_dev_20250608_1
Jun 8, 2025
Merged

close #19#23
freelw merged 33 commits intomainfrom
wangli_dev_20250608_1

Conversation

@freelw
Copy link
Copy Markdown
Owner

@freelw freelw commented Jun 8, 2025

lm support

@freelw freelw requested a review from Copilot June 8, 2025 12:11
@freelw freelw merged commit f12e3f0 into main Jun 8, 2025
1 check passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds language model (LM) support with a new LM target, decoder modules, dataloaders, and updates build & documentation.

  • Introduce LMDecoderBlock and LMDecoder classes under module/language_model/
  • Add LMDataLoader, Vocab reuse, and update makefile, README, and logs for LM
  • Wire up new lm.cpp entry point and VSCode settings for debugging

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
module/translation/seq2seq.h Redirect encoder/decoder includes into translation/ subdir
module/language_model/*.{h,cpp} New LM decoder block and decoder implementation
dataloaders/language_model/*.{h,cpp} New LM data loader and vocab integration
dataloaders/vocab.{h,cpp} Shared Vocab class for translation & LM
lm.cpp New main entry for training/inference of language model
makefile Added lm target and include paths for new modules
README.md, log.md Documentation and logs updated with LM usage examples
.vscode/settings.json, .vscode/launch.json Debug settings for LM
Comments suppressed due to low confidence (4)

dataloaders/language_model/lm_dataloader.h:1

  • The include guard has a typo (LM_DADALOADER_H); it should match the filename and read LM_DATALOADER_H.
#ifndef LM_DADALOADER_H

module/language_model/lm_decoder_block.h:1

  • [nitpick] No unit tests are provided for LMDecoderBlock; consider adding tests for its forward and get_parameters methods to cover key behaviors.
#ifndef LM_DECODER_BLOCK_H

module/language_model/lm_decoder_block.cpp:1

  • The header include uses a bare filename; to avoid ambiguity and ensure correct header resolution, consider using the full relative path: #include "module/language_model/lm_decoder_block.h".
#include "lm_decoder_block.h"

module/language_model/lm_decoder.cpp:1

  • Similar to the block file, include the decoder header via its subdirectory path: #include "module/language_model/lm_decoder.h" for consistency and to prevent collisions.
#include "lm_decoder.h"

break;
default:
std::cerr << "Usage: " << argv[0]
<< " -f <corpus> -c <checpoint> -e <epochs>" << std::endl;
Copy link

Copilot AI Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the usage message: -c <checpoint> should be -c <checkpoint>.

Suggested change
<< " -f <corpus> -c <checpoint> -e <epochs>" << std::endl;
<< " -f <corpus> -c <checkpoint> -e <epochs>" << std::endl;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants