Skip to content

mark-g-y/Distributed-Computing-Framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed-Computing-Framework

This is a flexible distributed data processing framework that automatically scales a user-specified processing function to process data in parallel on multiple machines. Fault tolerance is built into the system by keeping track of the data and rerunning the analysis if a machine fails.

To implement framework:

  • Look at the example/MultiplyByTwo.java tutorial -> it provides an example of how to implement this framework for processing large sets of data.

Example (how to compile & run the example/MultiplyByTwo.java job):

  • Assume ${HOME} is the source directory of this framework
  • Navigate into the example directory
  • javac -cp "${HOME}/target/distributedcomputing-1.0-SNAPSHOT.jar;${HOME}/lib/json-20140107.jar" MultiplyByTwo.java
    • In your own project, replace MultiplyByTwo.java with your own Java files
  • jar cf multiplybytwo.jar MultiplyByTwo.class
  • Modify/create the server.config file (as shown in example/server.config) - this specifies the worker machines, their hostnames, and their ports
  • ${HOME}/compute multiplybytwo.jar MultiplyByTwo worker 1234 (repeat for all workers)
    • Note: MultiplyByTwo is the name of the class that extends TaskPipeline. Replace this with your own class name
  • ${HOME}/compute multiplybytwo.jar MultiplyByTwo manager
    • Note: MultiplyByTwo is the name of the class that extends TaskPipeline. Replace this with your own class name

Simply change the names and paths from the above instructions, and it should allow you to run your own jobs!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published