Skip to content

Project 1: Taylor Nelms #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
d07b77d
First change to README (test)
taylornelms15 Aug 29, 2019
8f89fd0
naive additions, sans build environment
taylornelms15 Sep 1, 2019
0ae6d43
No-test implementations of some of the flocking functions
taylornelms15 Sep 2, 2019
62cfc59
test-less implementation of 1.2 secton
taylornelms15 Sep 2, 2019
2255089
Naive implementation looking fine, some notes added to README
taylornelms15 Sep 5, 2019
fb2dce8
Putting some progress into the repo out of habit; nearly have the fir…
taylornelms15 Sep 6, 2019
b237e5f
frustrating near-lack of progress
taylornelms15 Sep 6, 2019
91d8603
Finally got through 2.1
taylornelms15 Sep 6, 2019
8d84492
bugfixes on 2.1 discovered when toying with parameters
taylornelms15 Sep 6, 2019
5606081
Part 2.3 working
taylornelms15 Sep 6, 2019
dfc75cd
ready to start the analysis and writeup
taylornelms15 Sep 6, 2019
46c7652
Update README.md
taylornelms15 Sep 8, 2019
3bb47aa
Update README.md
taylornelms15 Sep 8, 2019
697ba91
some timing tools
taylornelms15 Sep 8, 2019
bee8de8
Result data
taylornelms15 Sep 8, 2019
f20a9a4
Merge branch 'master' of github.com:taylornelms15/Project1-CUDA-Flocking
taylornelms15 Sep 8, 2019
cf86e60
animated gifs at top
taylornelms15 Sep 8, 2019
2af67fb
Update README.md
taylornelms15 Sep 8, 2019
da759b1
Update README.md
taylornelms15 Sep 8, 2019
23ef140
Update README.md
taylornelms15 Sep 8, 2019
6e09833
Update README.md
taylornelms15 Sep 8, 2019
419e44d
Update README.md
taylornelms15 Sep 8, 2019
8b2bb1f
more data
taylornelms15 Sep 8, 2019
b66168d
Merge branch 'master' of github.com:taylornelms15/Project1-CUDA-Flocking
taylornelms15 Sep 8, 2019
22d072a
Update README.md
taylornelms15 Sep 8, 2019
b162692
more images
taylornelms15 Sep 8, 2019
d39c6af
Update README.md
taylornelms15 Sep 8, 2019
1de15df
images
taylornelms15 Sep 8, 2019
f94f9b5
imagE
taylornelms15 Sep 8, 2019
afbfa8f
Bad graph for fun
taylornelms15 Sep 8, 2019
c191c09
README analysis
taylornelms15 Sep 8, 2019
d28d39d
graphing info
taylornelms15 Sep 10, 2019
dd4463b
graphing info
taylornelms15 Sep 10, 2019
bc597ca
graphing info
taylornelms15 Sep 10, 2019
06bb6ee
graphing info
taylornelms15 Sep 10, 2019
0d5e513
graphing info
taylornelms15 Sep 10, 2019
12d3893
graphing info
taylornelms15 Sep 10, 2019
c5c02fe
graphing info
taylornelms15 Sep 10, 2019
38f223d
graphing info
taylornelms15 Sep 10, 2019
a5dc05f
Update GRAPHING.md
taylornelms15 Sep 10, 2019
773ed30
graphing info
taylornelms15 Sep 10, 2019
5e19218
graphing info
taylornelms15 Sep 10, 2019
9789726
graphing info
taylornelms15 Sep 10, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions GRAPHING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
GRAPHING WITH PYTHON
====================

## Files

The python script may be found [here](outputData/csvReader.py). The other relevant code is in the `main` file within `src`. For any gaps, or if you're curious as to the support structure behind some of this functionality, feel free to look there.

## Recording the CSV file

After collecting the data across a number of iterations, inside a `std::vector` of structs called `eventRecords`, I looped through each of them to extract the iteration number, the time between the two records, and the total simulation time up until that point:

```C++
void writeTime(const char* fileName) {
FILE* of = fopen(fileName, "w");

for (auto& record : eventRecords) {
double millisPerFrame = record.time / TIMEKEEPING_FRAMESIZE;
millisPerFrame /= 1000.0;//seconds per frame
double fps = 1.0 / millisPerFrame;
double seconds = record.totalTime / 1000.0;
fprintf(of, "%d,%0.3f,%f\n", record.frameNo, seconds, fps);
}//for

fclose(of);
}//writeTime
```

After this, I wrote the records into the csv. This involves rows being separated by newlines, and columns being separated by commas.

For my particular code, the filenames were hard-coded filenames, like the following:

```C++
const char timingFileName[] = "../outputData/CoherentGrid_HighDensity_128.csv";
```

## Using the Data

### Invoking the Python script

The script I wrote takes in at least two arguments; the first is the title for the graph (and, by extension, the name of the image to save), and all arguments after that are names of csv files, in the format recorded above.

For example, a way to use this script would be to call `python csvReader.py "Coherent Grid, High Density, All Block Sizes" CoherentGrid_HighDensity_*.csv`

The main function is as follows:

```Python

def main():
if len(sys.argv) < 3:
print("Please input a title and file names")
exit(0)

resultSets = []

for i, fileName in enumerate(sys.argv):
if i == 0 or i == 1:
continue
resultSets.append((fileName, readCSV(fileName)))

makeGraphs(resultSets, sys.argv[1])

```

It takes each CSV file, transforms it into a Numpy array, and binds that into a tuple along with the name of the file the data came from. It then hands each of those sets off to a function that graphs them, and saves out the data (or, optionally, displays it onscreen).

### Reading in the CSV data

I made use of the `csv` package, but the file format is so simple, you could quickly write up your own. The function was the following:

```Python
def readCSV(filename):
"""
takes in a filename
returns a numpy array, with the first row as the
timestamp in seconds, and the second row
as the fps across the last time block
"""
results = []
with open(filename) as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for line in reader:
timestamp = float(line[1])
fps = float(line[2])
results.append([timestamp, fps])

return np.array(results).T
```

If you're not familiar with the `numpy` library, it's a handy library for making array operations more efficient, while also making some simple things entirely too complicated. In this case, the `.T` within the return statement transposes the 2D array, making our two columns into two rows, which will be essential for displaying them nicely.

### Graphing the data

I used `matplotlib` to graph the data. It has a billion fun features, along with occasionally frustrating documentation, meaning all of my matplotlib code is a hacked-together mess of code snippets stolen from forum posts. That said, the goals here were relatively simple; I wanted to take a bunch of 2d arrays representing pairs of `(timestamp, fps)` data points and graph those series' onto a line plot.

The code was as follows:

```Python
def makeGraphs(resultSets, title):
"""
Displays the resultant data sets, along with a given title
"""
fig, ax = plt.subplots(1)
for filename, data in resultSets:
ax.plot(data[0], data[1], label = cleanFileName(filename))

ax.legend()
plt.xlabel("Time (seconds)")
plt.ylabel("Ticks/Frames per Second")

fig.suptitle(title)
fig.set_size_inches(10,6)

#plt.show() #uncomment this to display the graph on your screen
filePath = makeSavePath(title)
plt.savefig(filePath)
```

The core functionality is the `plot` function, where you can provide a series of `x` data, along with a series of `y` data, along with a bunch of other optional variables, such as a label for the series. I did this for each set of data, stuck some labels onto the plot, and then saved it all to an image file.

Happy coding!
68 changes: 62 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,67 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Taylor Nelms
* [LinkedIn](https://www.linkedin.com/in/taylor-k-7b2110191/), [twitter](https://twitter.com/nelms_taylor)
* Tested on: Windows 10, Intel i3 Coffee Lake 4-core 3.6GHz processor, 16GB RAM, NVidia GeForce GTX1650 4GB

### (TODO: Your README)
## Results
![](images/mLo_dMed_gMed.gif)
*No Grid used for implementation*
![](images/mMed_dMed_gMed.gif)
*Uniform grid used within implementation*
![](images/mHi_dMed_gMed.gif)
*Coherent grid used within implementation*

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
## Analysis

### Implementation Strategy

Unsurprisingly, the grid implementations ended up significantly more efficient than the naive implementation. For runs with 5000 boids, with a block size of 128, the FPS over a 45-ish-second run yielded the following results:

![](images/All&#32;Grids,&#32;Medium&#32;Density,&#32;Block&#32;Size&#32;128.png)

There are a few things to unpack here. First, the spike in the initial framerate of the naive implementation. Frankly, I have no idea why this exists; I was taking data points every 300 ticks, so I can't imagine this being some fluke of initial set-up taking less time. In all honesty, I would need to do significantly more debugging to figure it out.

Of course, the more interesting behavior lies within the meat of the simulation. The grid-based solutions performed better overall, with a slight improvement for the coherent grid over the uniform grid. Of course, algorithmically, this makes sense; the grid-based approaches cut execution time on the GPU from order `O(N)` to `O(n)`, where `n` is the number of boids that are within the grid-neighborhood of each boid. (On a CPU, the naive approach would run in time `O(N^2)`, while the grid approaches would run in time `O(Nn)`.)

For another example, let's look at each of the models with a higher density of boids; in this case, we're operating with 10,000 boids, rather than 5,000, in that same space:

![](images/All&#32;Grids,&#32;High&#32;Density,&#32;Block&#32;Size&#32;128.png)

Another notable part here is that the framerate drops off over time for the uniform grid, while the coherent grid stays relatively steady. The best I can figure for the drop is that, over time, each boid has more neighbors, and so the number of data accesses to those neighbors increases (as more of the boids are in flocks). This increases the penalties felt from the boid data being more scattered in the uniform grid implementation, as cached data accesses become less favorable.

### Number of Boids

Unsurprisingly, as the number of boids increase, the execution speed of the simulation decreases. Here are some comparisons for all the models, running with `2000` boids, `5000` boids, and `10000` boids:

![](images/No&#32;Grid,&#32;All&#32;Densities,&#32;Block&#32;Size&#32;128.png) ![](images/Uniform&#32;Grid,&#32;All&#32;Densities,&#32;Block&#32;Size&#32;128.png) ![](images/Coherent&#32;Grid,&#32;All&#32;Densities,&#32;Block&#32;Size&#32;128.png)

Unsurprisingly, the naive implementation shows a roughly linear relationship between simulation speed and the number of boids. The others have a more complex relationship, but the overall trend is clear, and they seem to do slightly better than linear with sample size.

### Block Size

The differences between block size were very interesting. I ran a series of simulations with block sizes of `32`, `128`, and `512`. Here are a couple of graphs comparing runs with various block sizes:

![](images/No&#32;Grid,&#32;Medium&#32;Density,&#32;All&#32;Block&#32;Sizes.png)![](images/Coherent&#32;Grid,&#32;Medium&#32;Density,&#32;All&#32;Block&#32;Sizes.png)

Notably, there is not that much difference in performance for the naive implementation based on block size. This makes some sense; so many of the operations are simple and easily parallelizable, it is hard to imagine that the various levels of scheduling or memory caching could make a significant performance difference.

However, for the grid implementations, block sizes made huge differences in outcome.

The block size of 32 ran the worst. This makes some sense; there must be some amount of overhead in creating a block, and getting access to the relevant memory, and given the number of blocks that were spun up at various points in the simulation step, it makes sense that those penalties would add up.

This would imply that a larger block size would improve performance; however, we see performance dip when we increase the block size from `128` to `512`. The best explanation I can think of is that a block needing to finish together before a new block can be put in its place would lead to situations where an entire block could be held up by a few rogure warps. In those cases, a whole section of processing power could be lost while the scheduler keeps a block running.

### Bonus Graph

Everyone needs a little graph gore in their life every now and then:

![](images/All&#32;Test&#32;Runs.png)

### Miscellaneous Notes

During the naive implementation, I changed my `distBetween` function, which computed the distance between two vectors,
between using the `glm::distance` function and a simple `sqrt(xdiff * xdiff + ydiff * ydiff + zdiff * zdiff)` function.
Though I would have expected the `glm::distance` function to be highly optimized in some fashion,
I saw framerate drop from around 10fps to around 2.5fps in the simulation window.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/All Test Runs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/No Grid, All Densities, Block Size 128.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/mHi_dMed_gMed.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/mLo_dMed_gMed.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/mMed_dMed_gMed.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions outputData/CoherentGrid_HighDensity_128.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
300,0.698,429.799427
600,1.326,477.707006
900,1.976,462.249615
1200,2.618,468.750000
1500,3.268,461.538462
1800,3.912,466.562986
2100,4.568,458.015267
2400,5.214,465.116279
2700,5.870,458.015267
3000,6.519,462.962963
3300,7.193,445.765230
3600,7.853,455.235205
3900,8.510,457.317073
4200,9.166,458.015267
4500,9.826,455.235205
4800,10.487,454.545455
5100,11.161,445.765230
5400,11.841,441.826215
5700,12.503,453.857791
6000,13.168,451.807229
6300,13.814,465.116279
6600,14.463,462.962963
6900,15.133,449.101796
7200,15.802,449.101796
7500,16.460,455.927052
7800,17.124,452.488688
8100,17.782,456.621005
8400,18.435,459.418070
8700,19.082,465.116279
9000,19.766,439.238653
9300,20.421,458.715596
9600,21.074,460.122699
9900,21.723,462.962963
10200,22.380,457.317073
10500,23.052,447.093890
10800,23.715,453.857791
11100,24.386,447.761194
11400,25.038,460.122699
11700,25.701,452.488688
12000,26.366,452.488688
12300,27.036,448.430493
12600,27.690,459.418070
12900,28.353,453.172205
13200,29.018,451.807229
13500,29.673,458.015267
13800,30.327,459.418070
14100,30.987,455.235205
14400,31.643,458.015267
14700,32.314,447.093890
15000,32.973,456.621005
15300,33.626,460.122699
15600,34.292,451.127820
15900,34.958,451.127820
16200,35.641,439.882698
16500,36.305,452.488688
16800,36.975,448.430493
17100,37.640,451.807229
17400,38.296,458.715596
17700,38.944,462.962963
18000,39.616,447.093890
18300,40.291,445.765230
18600,40.941,462.249615
18900,41.591,461.538462
19200,42.249,456.621005
19500,42.924,445.103858
19800,43.597,447.093890
20100,44.261,452.488688
20400,44.928,449.775112
20700,45.593,452.488688
64 changes: 64 additions & 0 deletions outputData/CoherentGrid_HighDensity_512.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
300,0.728,412.087912
600,1.408,441.826215
900,2.087,442.477876
1200,2.779,434.153401
1500,3.502,414.937759
1800,4.204,427.960057
2100,4.907,426.742532
2400,5.594,437.317784
2700,6.288,432.900433
3000,6.988,429.184549
3300,7.685,430.416069
3600,8.400,420.168067
3900,9.119,417.827298
4200,9.820,428.571429
4500,10.515,431.654676
4800,11.220,426.742532
5100,11.929,423.131171
5400,12.630,428.571429
5700,13.330,429.799427
6000,14.014,438.596491
6300,14.727,421.940928
6600,15.431,426.742532
6900,16.120,436.046512
7200,16.801,441.176471
7500,17.476,444.444444
7800,18.184,424.328147
8100,18.884,429.184549
8400,19.583,429.799427
8700,20.297,420.757363
9000,21.007,423.131171
9300,21.713,425.531915
9600,22.431,418.410042
9900,23.131,429.184549
10200,23.840,423.728814
10500,24.537,431.034483
10800,25.239,427.960057
11100,25.958,417.246175
11400,26.672,421.348315
11700,27.366,432.900433
12000,28.066,428.571429
12300,28.764,430.416069
12600,29.476,421.940928
12900,30.146,448.430493
13200,30.857,422.535211
13500,31.548,434.153401
13800,32.258,423.131171
14100,32.947,436.046512
14400,33.643,431.654676
14700,34.342,429.799427
15000,35.020,442.477876
15300,35.734,421.348315
15600,36.460,413.223140
15900,37.155,432.276657
16200,37.882,413.223140
16500,38.586,426.742532
16800,39.296,423.131171
17100,40.014,417.827298
17400,40.721,424.929178
17700,41.443,416.088766
18000,42.145,428.571429
18300,42.837,434.153401
18600,43.552,420.168067
18900,44.277,414.937759
19200,44.986,423.728814
Loading