Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async file close and S3 upload (Good First Issue) #92

Open
github-actions bot opened this issue Feb 14, 2025 · 8 comments
Open

Async file close and S3 upload (Good First Issue) #92

github-actions bot opened this issue Feb 14, 2025 · 8 comments
Assignees
Labels
beginner Issues for beginner (Documentaion Update etc)

Comments

@github-actions
Copy link

construct full file path


// TODO: Async file close and S3 upload (Good First Issue)
// construct full file path
filePath := filepath.Join(p.config.Path, basePath, fileMetadata.fileName)
// Remove empty files
if fileMetadata.recordCount == 0 {


This issue was generated by todo-issue based on a TODO comment in 1ed362e. It's been assigned to @hash-data because they committed the code.
@mrmagicpotato007
Copy link
Contributor

@hash-data i can take this if its still unassigned

@hash-data
Copy link
Collaborator

@mrmagicpotato007 sure you can take it.

@hash-data hash-data added beginner Issues for beginner (Documentaion Update etc) and removed todo 🗒️ labels Feb 17, 2025
@mrmagicpotato007
Copy link
Contributor

mrmagicpotato007 commented Feb 17, 2025

@hash-data my thought process is to use errgroup on array of parquetFiles. Am I on the right track?

func processPartitionedFiles(p *Processor) error {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

g, ctx := errgroup.WithContext(ctx)

for basePath, parquetFiles := range p.partitionedFiles {
	for _, fileMetadata := range parquetFiles {
		basePath, fileMetadata := basePath, fileMetadata // Capture loop variables

		g.Go(func() error {
			return processFile(ctx, basePath, fileMetadata, p)
		})
	}
}

return g.Wait() // Wait for all tasks to complete, returns first error if any

}

@hash-data hash-data moved this from Todo to In progress in Olake Roadmap 2024-25 Feb 18, 2025
@hash-data hash-data removed their assignment Feb 18, 2025
@mrmagicpotato007
Copy link
Contributor

@hash-data pls let me know your thoughts on this.

@hash-data
Copy link
Collaborator

@mrmagicpotato007 we have concurrent function in utils that can be utilized here.

@mrmagicpotato007
Copy link
Contributor

yes will use Concurrent func from utils .

@mrmagicpotato007
Copy link
Contributor

pr : #101

@hash-data can you review?

@hash-data
Copy link
Collaborator

will review it @mrmagicpotato007

@hash-data hash-data linked a pull request Feb 19, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginner Issues for beginner (Documentaion Update etc)
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants