Wednesday, 12 November 2014

Research Capstone: Parallel Processing

Parallel Processing - Week 14

The implementation of this week was focussed on preparing the project for testing and performance measuring. The following spreadsheets represent the performance of the software with different variables applied. The values below are calculated as the average processing time over 100 frames.

A* Pathfinding

Number of A* AgentsSequentialThread Pool
00.0001220.000121
20.0001240.000125
50.000130.000129
100.000140.000139
200.0001590.000158
500.0002180.000217
1000.0003810.000385
The chart above shows the performance of the software running a number of A* Pathfinding calculations. The following graph is a visual representation of this (Sequential in Red, Parallel in Blue):astar
Grass Avoidance
The spreadsheet below shows the performance of the grass avoidance which I spoke of last week with various dimensions of grass (200 represents a grid of 200x200 = 40000 grass blades).
Grass DimensionSequentialThread PoolOpenCL
100.0001720.0001780.003674
250.0002360.0001920.00396
500.0004710.0002280.005042
1000.0013830.0003650.009548
2000.0050840.0008610.02056
3000.0115590.0020450.040698
4000.020780.0052950.068533
5000.0330630.0093710.107917
grassThe chart above was expected with a thread pool of 8 threads performing the calculation at a faster rate than a single thread. However the chart below comparing this with OpenCL (shown in Yellow) was completely unexpected. At all stages OpenCL is slower than the other processing methods. I had initially expected it to be slower, however then it would gradually catch up with, then overtake, the other methods.grassgpuNote: The calculation used for the OpenCL measurements is not yet finished. It is lacking in functionality, therefore the final measurements should in fact be even slower than this, unless I'm doing something REALLY wrong. Sending the data in image format, rather than a large array may have been more beneficial to OpenCL.

Parallel Processing - Week 13

This week I began adding in a more expensive pathfinding system using the A* algorithm. This is currently working well using a search depth of 10 on a single thread on the CPU. Higher search depths or larger numbers of AI can cause the program to stutter and the frame rate can become quite irregular at times too.
This A* implementation only calculates the next best waypoint towards the final goal when the current waypoint has been reached.
*VIDEO REMOVED*
In the video demo above I only used one AI agent which follows the cursor around the map using the blue and red grid on the ground to navigate around obstacles. The blue nodes represent paths that the AI can travel on while red nodes are blocked by obstacles (trees in this case).
The next step will be to fix some of the lag spikes which occur (see performance graph) and then move this implementation to OpenCL and the thread pool.

Parallel Processing - Week 12

The subject of this research paper has changed slightly from previous weeks, now focussing entirely on Parallel processing. Distributed processing over the network will be implemented as part of the Game Development Capstone but will no longer be part of this research paper, leaving more time for Parallel processing research.
Since last week, I have added a visual graph which outputs a value over time.  The graph below outputs the processing time for a frame every 0.02 seconds. This gives me a very good visual representation of the difference in average performance between different processing methods.profiler
As processing the random waypoint AI using OpenCL turned out to be MORE expensive than just calculating this on the CPU, I decided to add a large bulk calculation that caused a great drop in performance for the CPU. I transferred an old grass shader to DirectX 11 for this, as I remembered performing collision detection on this used to cause a significant performance loss.
grass
The graph below demonstrates the difference between the time taken to process different parts of the program. It is evident that the grass collision detection is once again causing a significant performance loss.
performance change
Once this is calculated using OpenCL, the performance of this compared to a more complicated AI algorithm in OpenCL may provide an interesting insight into the best use of OpenCL, instead of basing this research paper entirely on AI algorithms.

Parallel Processing - Week 11

As DirectX 11 has removed the font interface from DirectX 10, I created a font renderer to display performance of the program using the geometry shader and a texture file containing all of the letters in the font (see image below). I would like to upgrade this to a graph of some sort to better demonstrate the average frame rate but this might have to wait until later once all of the other features have been implemented.
font
The random waypoint AI has now been implemented in OpenCL and is working as intended.
opencl
While I can't demonstrate that the OpenCL processing is working by the still image above, the on screen text does show that it's active.
From here I will do the same with the Distributed Processing across the network based on the networking implementation using WinSock from my old summative. After this I will add a more complex AI algorithm, such as A*, which can be easily scaled up (by increasing the navigation grid size) to increase the processing times of the program and demonstrate the performance gain / loss of each processing method at different grid sizes.

Parallel Processing - Week 10

Not much progress was made this week in the technical implementation of this research. My thread locks are still not placed properly and some threads are still stealing jobs from other threads before they can be removed from the job list (see below).
threadJobs
This threads job doesn't exist anymore but the thread still tries to process it.
I have begun reading through the book "Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers", by Barry Wilkinson and Michael Allen (2004) to get a better understanding of other ways these techniques may be implemented and other topics such as Load Balancing and Message Passing methods.

Parallel Processing - Week 08

As the holidays were spent working on personal projects, I did not progress with this research capstone during that period.
The purpose of last week has been to implement some of the techniques I have been researching over the last few weeks, in doing this I can find flaws in my design and research improvements in those areas. To begin with I began experimenting with my old threading framework which was using the "WinThread" libraries. After some time I decided to switch to C++11's thread library and looking into different methods of implementing thread pools, as mentioned in previous posts.
thread pool
This thread pool design was mainly inspired by Intel's "Don't Dread Threads" video series, which discussed using "Data Decomposition" to break parts of the program into smaller tasks which can be added to the thread pools Job List (the vector of std::functions shown below). Once created, the worker threads call the Run() function are placed in stasis by a mutex until new jobs are added to the job list. At this point the threads will find a job, remove it from the list and process the job before finding a new job or returning to an idle state if no more jobs are in the job list.
This is currently working really well, the design is easily scalable and works as intended for any number of threads, even just using a single thread. The "Functional Decomposition" method could be implemented along side this, adding the processing of specific components (such as sound or loading screens) to this job list. I have moved this system to my basic random-waypoint chicken framework. The parallel job list is working very well, however threads are currently stealing jobs from other threads before they can remove their job from the list as I am not properly locking the job list. This is causing the system to crash, as the threads are left with invalid data which can not be processed.
I have also began working on adding my previous OpenCL tests to this framework.
testKernell
Each Processing Element (similar to a thread on the GPU) in a Compute Unit (collection of Processing Elements) is assigned a unique ID for each time a kernel is run, this ID is used as the index for each array accessed. So, for now this is just a simple function adding two arrays together, but... next week!! Great things will happen and this should also feature some AI functionality.

Parallel Processing - Week 06

This week was spent creating a basic framework for testing the implementations of this research.
This framework is made from a personal project which has been stripped down to just feature 20 chickens walking around a 10 x 10 grid. Each chicken is given a light, the blue lights represent chickens that are currently in an idle state while white lights are walking chickens.
chickens
This will provide me with a visual representation of the AI for future use. I will begin moving this basic functionality to the GPU, then improve it from there.

Parallel Processing - Week 05

This week was spent researching the use of parallel processing in Artificial Intelligence as this will be my means of demonstrating the different techniques.
The most informative explanation I found for implementing AI with OpenCL was to build the artificial intelligence algorithms to be as close as possible to graphics algorithms. Like writing shaders, but for AI. This will ensure you treat the graphics card in the same fashion, avoiding branching. According to Alex J. Champandard, a GPU compute unit "must follow the “worst case” branch if only one other compute unit takes that path." so while branching may be more efficient for CPU calculations, this is not necessarily the case for GPU calculations.
Data representations should also be treated in a similar fashion as shaders as the transfer of data to and from the GPU may become a bottleneck in your system, regardless of how efficiently OpenCL performs the calculations. These data representations include storing data in images or matrices, in order to make the most of the functions supplied with OpenCL like read_imagef.

Parallel Processing - Week 04

This week was spent researching what areas parallel and distributed processing are commonly used in. Intel's video series "Don’t Dread Threads" was very informative in terms of designing a scalable thread pool for different computer systems.
In "Don't Dread Threads" they stated that "Functional Decomposition" (running separate threads for networking, AI, sound) should not be used. This results in latency, poor load balancing and is not scalable on varying core counts.
Instead use "Data Decomposition". Break groups of tasks (such as AI, networking) into smaller tasks. Add these smaller tasks to a job list and assign threads in a pool to grab these tasks when they're free. Jobs are given "high penalties” for blocking. This method is easier to load balance and, due to the thread pool being of any size, this is scalable for any number of cores, even just a single thread in a pool.
Intel has a range of software which is used for analysing which parts of code are better suited to a multithreading approach. The suggested software are Intel VTune, Intel Parallel Amplifier (displays “hotspots” that would be ideal for parallelism) and Intel Graphics Performance Analyzers (gives you a complete overview of system resources - CPU, GPU, Bus).

Parallel Processing - Week 03

This week has mainly been spent becoming familiar with the CUDA SDK and OpenCL with the AMD APP (Accelerated Parallel Processing) SDK in order to understand the workflow and what kinds of processes could be offloaded to the GPU.
A great series of tutorials can be found here:  https://www.udacity.com/course/cs344 and here: https://www.coursera.org/course/hetero . Both are free online courses with a wide range of exercises.
Next week I will research different design patterns and immutable objects that are used in the games industry.

Parallel Processing - Week 02

Since writing my proposal, the focus of this capstone has slightly shifted to researching Parallel and Distributed Processing in general. While I will still be comparing different parallel processing approaches, such as CUDA, OpenCL and Multi-threading, I also want to compare this to distributing some more expensive calculations across networked clients. This was partially inspired by our networking assignment from the previous year.
net4
I was performing all calculations server-side meaning each client just had to render the scene. The servers calculations were multithreaded (one thread per client) and this worked really well for 2 - 10 players, however anymore caused the performance of the server to drop significantly. Distributing these calculations across the clients would have solved this problem and would definitely benefit our Titan group project.

Parallel Processing - Week 01

The first week of this paper has been spent deciding upon a topic for this capstone project. Ideally this would tie in with our large group project and possibly even Liwaa's Capstone paper. Below are some ideas I was throwing around:
  • Multithreaded AI system (possibly using CUDA / OpenCL)
  • AI for real-world applications, such as simulating a fire escape, traffic or creating aerodynamic shapes using Genetic Algorithms
  • A Deferred Renderer
  • Component system with prefabs, similar to Unity. Would tie in well with a level editor
  • Creating a dynamic navmesh or flowfield navigation, ties in well with Titan project.
After a discussion with Liwaa and Tece on Monday, I've decided to go with researching designs for multi-threaded systems. This can be tested by researching various AI implementations from Liwaa's capstone project and can be expanded upon by comparing the performance of multi-threading versus other parallel methods, such as CUDA, OpenCL or even passing specific calculations to other network clients.

Friday, 6 December 2013

Personal Projects

______________________________________________________________________________

3D Civilisation

I decided to give myself a project over the holidays to keep my brain active. I've always wanted to make some sort of civilisation simulator so decided now was a good time to start it. This isn't exactly a game and it's not something I am planning on publishing or anything, just trying to simulate a group of people surviving, generating food, building houses. The player may join in with the world but will have as much impact as the AI characters. I'm hoping this will turn into a nice summary of everything I've learned at MDS so far, with AI, rendering techniques I didn't get to try out during class and perhaps some multithreading to increase the performance of the AI. We'll see how it goes...
The aim of this is to get AI that will make decisions by itself based on what it decides is best for the state of the village, instead of having a fixed set of commands or particular state.
Started building this using the OpenGL framework but have switched to DirectX again now.

So far I have only really been focusing on rendering techniques. The AI currently just moves to random waypoints and avoids obstacles. Skeletal Animation of my chickens is nearly complete, using Assimp, however some vertices are still flying off to infinity.



Christmas Update
I've changed all of the 2D billboard sprites into 3D models, still just chickens, humans and trees. Since week 01 I've mainly just been working on changing my DirectX framework to be more like my OpenGL one, as I much prefer that framework. I've changed the pipeline to suit a deferred shading pipeline. Deferred lighting is now working really well, although I'm not yet rendering the lights to a texture as objects. Once this is done the performance will be much better. I've recently also added grass to the terrain using the Geometry Shader. This turned out to be much more efficient than I thought it would be. I'm currently rendering 250,000 grass blades at a time and maintaining a decent framerate. Will switch to developing on my laptop from now on in order to optimise the game for that.




Start of Production
Currently featuring a couple of humans and their chickens, in a spiral forest.
Below are some screenshots of the state of the framework so far.

Added lighting and radial blur





______________________________________________________________________________

OpenGL Renderer

Started working on an OpenGL Renderer using Glew to handle the extension libraries for me.
Used this framework for the cloth simulation summative for Media Design School.

Got some inspiration while playing Arma 3 to try turning my OpenGL framework into some sort of flight simulator. Obviously nothing like Arma but it gave me a purpose and target to work towards with this framework, so decided to make use of my OBJ reader and imported a REALLY ROUGH helicopter and fixed wing model. Both the fixed wing and helicopter can fly at the moment but they require a lot more work. It's pretty horrible at the moment.
3rd Person cam
Fixed wing cockpit
Added "head" rotation independent of ship rotation




Saturday, 16 November 2013

MDS - 2nd Year

______________________________________________________________________________

Do You Even Lift?

This is a local multiplayer forklift puzzle game for two players. Forklifts must work together to solve puzzles and exit the level, while the boss robot attempts to stop their escape.
Amongst other things, I am responsible for the fork lifting mechanic, the Boss robot AI, the Level select scene and writing shaders.

This game is also made using Unity with C#. We presented this game at the December Game Developers Meetup, 2013.

Team Members
Artists: Shawn Elliot, Dean Barile, Josh Rawlings and Natanel Apfel
Programmers: Chris Johnson, Ben Carnall and Myself








______________________________________________________________________________

Magical Train Adventure

The purpose of this assignment was to give the first years their first group game experience. The game had to be a non-violent game featuring a certain constraint. Our constraint was that the game had to feature an umbrella, or an object which was used in a way other than how it's intended.

The game our team decided upon was a cooperative network multiplayer game in which the players have to work together to control a train. Each player uses an android device as a controller to interact with the train on the server, running on PC. In order to achieve this we used Unity with C#. We presented this game at the December Game Developers Meetup, 2013.

Team Members
Artists: Lena Yaroshenko, Tom Lowbridge and Terai Tauvavau
Programmers: Steve Johnson and Myself
Our team consists entirely of second year students.

The following video demonstrates a driver controlling the height of the train tracks. Once the required upgrade is unlocked they take the train to space.


Each player has two scenes on their android device.
Inventor Device
Left:Inventors upgrade bench with mixing bowl
Right: Puzzle scene for combining elements to make upgrades

Engineer Device
Left: Engineers furnace - shovel coal to give power to driver
Right - Passenger compartment - they may get set on fire in the presence of umbrellas



Driver Device
Left: Driver Scene - move handle to make train go faster
Right: Handle controls height of train tracks. Upgrades required to move to space or underwater

______________________________________________________________________________

Decap-a-Weta

This assignment is a group project. The purpose of this assignment was to produce a game prototype while closely following a chosen Software Engineering Methodology. Our team chose to follow Agile.
I acted as Team Leader for this assignment. Fortunately I had a fantastic team who managed themselves well and thus required very little leading.
The project started by our tutor going around MDS asking people to put words into a hat which they thought would be really hard to make a multiplayer game out of. We drew the word "Weta". Our initial inspiration came from games such as Joust (1982).
The programming tasks were divided equally between us and, as our team consisted entirely of programmers, Ben and I created the art for the game. 

Decap-a-Weta is a battle arena game featuring people armed with javelins riding on the backs of wetas. The characters can climb walls and jump high. The aim is to get a higher kill count than the other players within the given time period. For the prototype we created two levels, each with a unique environmental obstacle which the players had to avoid. For the forest level, a giant eagle would attack the player from one of the four corners, while in the "Goo" level the players could die by falling into a fit of goo through holes in the level.
This game is created using Unity. For most of us, this was our first experience with Unity and C#.

Team Members: Ben Carnall, Chris Johnson, Steve Johnson and Myself.

Final Grade: A-

The screenshots below don't quite do the game justice, it gets very hectic with four players and the eagle.



______________________________________________________________________________

Physics

Final Physics Grade: A-

Summative 2: Cloth Simulation

For this summative we had to make a cloth simulation. 
The cloth below features collisions with the ground, sphere and itself. It can be dragged and torn with the mouse and is blown by the wind. The scene has adjustable wind levels using the + and - keys. The pin points at the top of the cloth (similar to curtain loops) can be moved closer to each other and further apart.
For this I used C++ with OpenGL for rendering.

Curtain Ring movement
Sphere collision
Cloth Tearing
Torn Cloth

Summative 1: Grumpy Birds

The aim of this assignment was to create a game of "Grumpy Birds" using the Box2D engine.
For this I used C++ with DirectX 10.



______________________________________________________________________________

Networking & Multithreading

This assignment was to make a game called "Candy Grab" using Networking and Multithreading.
In Candy Grab, players must run around a store collecting candy. Each pickup grants the player 15 extra health, colliding with another player or wall reduces their health by 5. Once the player's health reaches 0, they are kicked out of the store. While a large number of players may join the game, only 4 players may be inside the store at any one time, after this the doors are locked (blue squares at the top-left). If a player is kicked out of the store, the doors are opened again.
I wanted all of the major calculations to take place on the server side. For this to work efficiently multi-threading is used to process the player input and collision detection. Once a player joins the server, they are each assigned a thread. Each frame the client sends the user input to the server, each server thread will process the input of the player it is assigned to and, once all input has been processed. After this it checks for collisions between their new positions and responds to these accordingly. Once all collisions have been processed the final positions of each player are sent back to the client for rendering.
This design allows the client executables to focus almost entirely on rendering alone while the server manages the rest.

For this I used C++ with DirectX 10 for rendering.

Final Artificial Intelligence Grade: A+





______________________________________________________________________________

MySQL Database Summative "Undead Moa"

This assignment was to utilise MySQL to populate a database of student, lecturer, summatives and other information. This was a group assignment. We decided we wanted to make this database interface something interesting and fun to use, so we made an interactive cube interface. The cube can be rotated across the Y-Axis only and each face of the cube acts as a separate menu. This allows the user to have four databases open at once (one for each face of the cube), each rendered to a texture and placed on the cube.
The user can click on a menu item and edit the contents, this will save it back into the MySQL database.
We named the interface "Undead Moa" in honour of the old Media Design School intranet.

Team Members: Steven Johnson, Chris Johnson, Declan Wong, Harrison Thrupp and Myself
Database Grade: A







Undead Moa with two different databases open.



______________________________________________________________________________

Artificial Intelligence

The purpose of this assignment was to demonstrate an understanding of different AI techniques. We had to chose five techniques from a list and swap between these in the same executable. Other AI summatives included behaviour trees, minmax trees, binary trees, A* Path finding and solutions to Tower of Hanoi and Connect 4.
For this I used C++ with DirectX 10 for rendering.

Final Artificial Intelligence Grade: A


1: Obstacle avoidance: They avoid fixed obstacles, other AI and the screen walls. Large circles represent fixed obstacles in the scene.

2: Seek
3: Evasion / Flee
 4: Flow Field following
 5: Path following

______________________________________________________________________________

DirectX 10 Graphics & HLSL

The purpose of this assignment was to demonstrate a graphics technique using DirectX 10's shader pipeline.
I chose shadow mapping for this assignment. This demo features soft shadow mapping, adjustable bloom, adjustable colour correction, normal maps, a skybox and diamond-square generated terrain.
The light / shadow map texture can be seen in the top-right viewport.

Final Graphics Grade: A