Part 2 - Latency Meter and Simulation
June 13, 2019

Attention: This tutorial was written for Godot 3. Not only the information here contains several flaws, networking has changed significantly in Godot 4.

In the previous part we worked on a bare-bones authoritative server implementation, while also mentioning a bunch of possible problems that can arise because of network latency. However, out of the box we can't verify any of those issues with Godot. That's because we can't exactly run the game and specify a latency value to be simulated by the engine. In this part we will work around that and incorporate a system that will allow us to simulate this delay in the networking.

But before doing that we will work around yet another aspect. Again out of the box we can't measure the latency, also known as ping. Although we can't perform true ping requests, we still can somewhat measure an approximate latency value. I will show how to do that.

Ping/Pong

As I have mentioned, we can't perform true ping requests with Godot, not out of the box. What happens is that those types of packets use the Internet Control Message Protocol (ICMP). Those kinds of packets are on a different layer than those we can work with in Godot. To be more specific, ICMP packets (such as pings) are in the network layer (layer 3) where the packets we can work with in Godot (TCP or UDP) are in the transport layer (layer 4). You can read more about that here and here. One very important aspect of ICMP packets is the fact that those don't need connections to be established.

Knowing about that, the latency measurement implemented here brings the limitation that we have to send the "ping" request to a destination that is previously expecting our kind of packet. Another thing to keep in mind is related to the accuracy. It will probably not be the same value given by true ping requests.

With that disclaimer out of the way, let's see how to implement this. First the very basic idea of the system:

  1. Machine 1 calls a remote function on Machine 2 and starts counting time.
  2. Once in the remote function in the Machine 2, answer back calling a different remote function on Machine 1.
  3. When in the "answer function", take the elapsed time, which indicates the latency.

Now let's go into a little bit more detail before implementing. The very first thing to keep in mind is the fact that measuring this just once is almost useless, simply because network conditions potentially change all the time. Therefore we have to create a system to continuously send "ping requests". But of course we can't just send a request at each loop iteration, otherwise we may "clog" the network, so we will create a ping_interval value that will tell how long (in seconds) we have to wait before sending another packet.

We will use the unreliable protocol because it's the fastest one. But because they may not even arrive we have to create a timeout, which will be done through another property named ping_timeout. When this timer expires we then increase an internal counter indicating the number of lost packets and then immediately send another ping request and restart the timer associated with ping_timeout.

To facilitate our work we will use the Timer class provided by Godot. We will use a single one for both ping_interval and ping_timeout since we don't need both running at the same time. The idea here is that once a packet is sent we begin counting ping_timeout seconds. If we get a response before that time, we reset the timer to ping_interval and wait it expire. Once there, we send another packet while resetting the timer to ping_timeout. And the loop continues until the player disconnects. Bellow a somewhat simplistic flowchart of the system:

Then we have to think exactly about the target ping/pong requests. Of course the client wants to know the latency to the server. But the server also wants to know the latency to every single connected client. So the system we will implement consists in the server creating a "ping/pong" loop to each of the connected players. The measured values are then relayed to the clients. The interesting thing here is that each client will be able to show, maybe in a score window, the measured latency values for each one of the other players. To improve things a little bit, every time we send a "ping request" to a client, we will also send the last measured latency value so the peer in question can have a faster updating of their own measured "pings".

With all that information we can now declare the control properties that we will need:

network.gd
const ping_interval = 1.0           # Wait one second between ping requests
const ping_timeout = 5.0            # Wait 5 seconds before considering a ping request as lost

# This dictionary holds an entry for each connected player and will keep the necessary data to perform
# ping/pong requests. This will be filled only on the server
var ping_data = {}

Now let's fill the ping_data dictionary. As mentioned in the comment, it's meant to be used only by the server. We currently have an empty function handling the event of a new player getting connected to the server. This event is sent to everyone connected, but we want only the server to do something, which is to fill the internal data. Each entry in the dictionary will also be another dictionary and will be found by the unique ID of the connected player. Bellow are the fields we will need in each entry of the ping_data:

Ok, so when the client gets connected (and we are on the server), we create an entry in the ping_data containing those fields. After that we initialize the timer object to be a one shot, wait the ping_interval seconds (so the loop can begin there), tell to count the time using the idle process rather than the physics process, assign the function that will be called when the timeout occurs and, finally, set the node name of this timer. What must be kept in mind is the fact that timers only get updated when added to the node tree and will have to remove the timer once the client disconnects. Once the timer is setup it will be added into the tree and the dictionary entry added into the ping_data. Just to make sure, we start the timer, although in theory it should automatically do so:

network.gd
func _on_player_connected(id):
	if (get_tree().is_network_server()):
		# Initialize the ping data entry
		var ping_entry = {
			timer = Timer.new(),          # Timer object to control the ping/pong loop
			signature = 0,                # Used to match ping/pong packets
			packet_lost = 0,              # Count number of lost packets
			last_ping = 0,                # Last measured time taken to get an answer from the peer
		}
		
		# Setup the timer
		ping_entry.timer.one_shot = true
		ping_entry.timer.wait_time = ping_interval
		ping_entry.timer.process_mode = Timer.TIMER_PROCESS_IDLE
		ping_entry.timer.connect("timeout", self, "_on_ping_interval", [id], CONNECT_ONESHOT)
		ping_entry.timer.set_name("ping_timer_" + str(id))
		
		# Timers need to be part of the tree otherwise they are not updated and never fire up the timeout event
		add_child(ping_entry.timer)
		# Add the entry to the dictionary
		ping_data[id] = ping_entry
		# Just to ensure, start the timer (in theory is should run but...)
		ping_entry.timer.start()

As you can see, when the ping timer expires, by default it will call the _on_ping_interval() function, the unique ID of the client will be sent as a parameter to that function and the flag CONNECT_ONESHOT given in the connect() function means that the function will be disconnected once the event is fired up. We do this to make things simpler when interchanging between the ping interval and ping timeout timers. Before working on the _on_ping_interval(), let's work on some other functions.

First one called request_ping(). In this function we first connect a function to the timer that is meant to be called when the ping timeout expires (by default 5 seconds). We start the timer and then call a remote function named on_ping() which is meant to be run only on the client:

network.gd
func request_ping(dest_id):
	# Configure the timer
	ping_data[dest_id].timer.connect("timeout", self, "_on_ping_timeout", [dest_id], CONNECT_ONESHOT)
	# Start the timer
	ping_data[dest_id].timer.start(ping_timeout)
	# Call the remote machine
	rpc_unreliable_id(dest_id, "on_ping", ping_data[dest_id].signature, ping_data[dest_id].last_ping)

As you can see from the rpc() call, we are sending the signature as well as the last measured ping value to the on_ping function. When in this function, we call the server back giving the signature of the ping request. After that we emit a signal (which will be created shortly) meant to indicate that we have a new measured ping value:

network.gd
remote func on_ping(signature, last_ping):
	# Call the server back
	rpc_unreliable_id(1, "on_pong", signature)
	# Tell this client that there is a new measured ping value - yes, this corresponds to the last request
	emit_signal("ping_updated", get_tree().get_network_unique_id(), last_ping)

And then, the answer is received on a new function that we have to create, on_pong(). Since this function is meant to be run only on the server, we will perform this check at the beginning of it and bail if not the case. Nevertheless, this function expects the signature of the request, so we can check if the packet matches the packet that has been sent. In that case, we use the time_left property in the timer object to calculate the elapsed time, converting from seconds to milliseconds. The network ID of the caller can be obtained by using the get_rpc_sender_id() function. Another task of this function is to start the ping inverval timer again, so the ping/pong loop can continue. Then we broadcast this measured value to every connected client and emit the ping_updated signal so the server can also do something with the measured value within its user interface:

network.gd
remote func on_pong(signature):
	# Bail if not the server
	if (!get_tree().is_network_server()):
		return
	
	# Obtain the unique ID of the caller
	var peer_id = get_tree().get_rpc_sender_id()
	
	# Check if the answer matches the expected one
	if (ping_data[peer_id].signature == signature):
		# It does. Calculate the elapsed time, in milliseconds
		ping_data[peer_id].last_ping = (ping_timeout - ping_data[peer_id].timer.time_left) * 1000
		# If here, the ping timeout timer is running but must be configured now for the ping interval
		ping_data[peer_id].timer.stop()
		ping_data[peer_id].timer.disconnect("timeout", self, "_on_ping_timeout")
		ping_data[peer_id].timer.connect("timeout", self, "_on_ping_interval", [peer_id], CONNECT_ONESHOT)
		ping_data[peer_id].timer.start(ping_interval)
		# Broadcast the new value to everyone
		rpc_unreliable("ping_value_changed", peer_id, ping_data[peer_id].last_ping)
		# And allow the server to do something with this value
		emit_signal("ping_updated", peer_id, ping_data[peer_id].last_ping)

Next we work on the remote function that is called by the server when there is a new measured ping. This function receives the ID of the peer that triggered the call as well as the measured value. All that this function must do is emit the signal so the other code can act on this event:

network.gd
remote func ping_value_changed(peer_id, value):
	emit_signal("ping_updated", peer_id, value)

We can finally work on the on_ping_timeout() and on_ping_interval() functions. On the first case we want to update both the signature and the packet lost values associated with the network ID given in the function's argument. After that, we request a new ping just by calling the request_ping() function. This one will take care of configuring the timer object. As for the ping interval we update the signature and then call the request_ping() function. That's it! The ping/pong loop is setup:

network.gd
func _on_ping_timeout(peer_id):
	print("Ping timeout, destination peer ", peer_id)
	# The last ping request has timedout. No answer received, so assume the packet has been lost
	ping_data[peer_id].packet_lost += 1
	# Update the ping signature that will be sent in the next request
	ping_data[peer_id].signature += 1
	# And request a new ping - no need to wait since we have already waited 5 seconds!
	request_ping(peer_id)

func _on_ping_interval(peer_id):
	# Update the ping signature then request it
	ping_data[peer_id].signature += 1
	request_ping(peer_id)

EDIT: There is a problem with this code, though. When packets are lost the following message can be thrown by the Godot system:

ERROR: Signal 'timeout' is already connected to given method '_on_ping_timeout' in that object.
   At: core/object.cpp:1476 (ETA: stack trace shows network.gd:214)
ERROR: Nonexistent signal: timeout
   At: core/object.cpp:1533 (ETA: stack trace shows network.gd:254)

[cheesegreater] has found the root of the problem and the solution for it! What happens is that oneshot connections are not removed until the next loop iteration so when a packet is lost the request_ping() tries to connect the function _on_ping_timeout() again into the timer's timeout signal. The solution is to postpone the call to to the request_ping() function from the _on_ping_timeout() through the call_deferred() function. The code then becomes like this:

network.gd
func _on_ping_timeout(peer_id):
	print("Ping timeout, destination peer ", peer_id)
	# The last ping request has timedout. No answer received, so assume the packet has been lost
	ping_data[peer_id].packet_lost += 1
	# Update the ping signature that will be sent in the next request
	ping_data[peer_id].signature += 1
	# And request a new ping - we need to wait until the end of the update so that the existing connection gets removed
	call_deferred(request_ping(peer_id))

Of course we have to declare the signal that we have been using:

network.gd
signal server_created                          # when server is successfully created
signal join_success                            # When the peer successfully joins a server
signal join_fail                               # Failed to join a server
signal player_list_changed                     # List of players has been changed
signal player_removed(pinfo)                   # A player has been removed from the list
signal disconnected                            # So outside code can act to disconnections from the server
signal ping_updated(peer, value)               # When the ping value has been updated

One last thing is cleaning up the timer object as well as the ping data entry whenever a client disconnects from the server. We already have the function that is called on this relevant event, and it's already performing some other cleanup. The function is _on_player_disconnected(). In there, before we unregister the player data (both on the server and on the other clients) we first cleanup the ping data:

network.gd
func _on_player_disconnected(id):
	print("Player ", players[id].name, " disconnected from server")
	# Update the player tables
	if (get_tree().is_network_server()):
		# Make sure the timer is stoped
		ping_data[id].timer.stop()
		# Remove the timer from the tree
		ping_data[id].timer.queue_free()
		# And from the ping_data dictionary
		ping_data.erase(id)
		
		# Unregister the player from the server's list
		unregister_player(id)
		# Then on all remaining peers
		rpc("unregister_player", id)

GUI Controls For the Ping

Ok, now that we have means to measure the latency, we need to show that somewhere. So let's quickly work on the GUI to display them. We will somewhat revamp the player list which will also serve as base for part 4 where we add means to kick players from the server. What we will do is create a new scene that will be instanced once per connected player. This scene will hold GUI controls and script logic to perform some of the tasks we want. One of them is display the measured latency associated with the player.

All that said, create a new scene and choose MenuButton as root node and rename it to PlayerEntryRoot. This will allow us to attach a popup menu into this "control". We will work on that at a later moment but for now let's focus on the player name plus its measured latency. Expand the width of the button a little. For reference, mine resulted in 210 x 20.

Because we want to place a few controls in a row, add one HBoxContainer control inside the PlayerEntryRoot and then change its layout setting to Full Rect so it fully use the dimensions of the button.

Just to "spice" things a little bit we will add an image representing the player's avatar. So, add one TextureRect control into the horizontal box and name it to Icon. We have to change some properties now. First, enable the Expand so we can resize the texture to an smaller size than the chosen image. Now expand the Rect category and change the Min Size to 20 x 20 (the same height of the PlayerEntryRoot). By doing this the texture will use space in the horizontal box allowing other controls to be placed on its side.

Then we need a Label control to hold the player name. So, add one into the PlayerRow and rename it to lblName. Change its Valign property to Center so the text appear vertically centered.

Finally, we need another Label to hold the measure latency value. So, add another one and rename to lblLatency. As with the lblName, change the Valign property to Center.

Now save the scene and name the file ui_player_list_entry.tscn. Next, attach a new empty script file to the PlayerEntryRoot. The suggested name, ui_player_list_entry.gd is fine. At this point it should contain a single line, extends MenuButton.

We will dynamically spawn instances of this scene, substituting the Label control. On that moment we have to provide some data so the correct information is shown. So, let's create one function meant to retrieve the player name as well as the modulation color of the player's avatar and apply that to the controls:

ui_player_list_entry.gd
func set_info(pname, pcolor):
	$PlayerRow/lblName.text = pname
	$PlayerRow/Icon.modulate = pcolor

We also need a function to update the lblLatency label. It will basically take a conversion of the value to integer (just to truncate the value) and then convert that into an string, assigning into the text property of the lblLatency. Also, we surround the value with "()":

ui_player_list_entry.gd
func set_latency(v):
	$PlayerRow/lblLatency.text = "(" + str(int(v)) + ")"

Now, in the script where we populate the player list we have to substitute the label by this new control. The new code basically consists in preloading the scene ("ui_player_list_entry.tscn") then creating an instance for each entry in the player list. With the instance we call the set_info() function we have just created and then proceed as before by adding it into the list box. But there is something else we have to do. We must keep track of the correct entry instance to correctly update the ping value with the set_latency() function. The easiest here is to maintain a dictionary where the key is the player unique network ID and the value is the instance of the created control. When in this function we will then clear the contents of this dictionary since we will also clear the contents of the list box. Anyway, let's declare the dictionary:

game_world.gd
var player_row = {}

And then, the new code of the function that updates the list looks like this:

game_world.gd
func _on_player_list_changed():
	# First remove all children from the boxList widget
	for c in $HUD/PanelPlayerList/boxList.get_children():
		c.queue_free()
	
	# Reset the row dictionary
	player_row.clear()
	
	# Preload the entry control
	var entry_class = load("res://ui_player_list_entry.tscn")
	
	# Now iterate through the player list creating a new entry into the boxList
	for p in network.players:
		if (p != gamestate.player_info.net_id):
			var nentry = entry_class.instance()
			nentry.set_info(network.players[p].name, network.players[p].char_color)
			$HUD/PanelPlayerList/boxList.add_child(nentry)
			player_row[network.players[p].net_id] = nentry

Early in this part we have created a new signal in the networking code meant to indicate the fact that a ping measurement has been changed. We will then connect a new function into it that will be used to update the player's row control. The connection code:

game_world.gd
func _ready():
	# Connect event handler to the player_list_changed signal
	network.connect("player_list_changed", self, "_on_player_list_changed")
	# Must act if disconnected from the server
	network.connect("disconnected", self, "_on_disconnected")
	# Once a new ping measurement is given, let's update the value within the HUD
	network.connect("ping_updated", self, "_on_ping_updated")

   ... # Previous code

Shortly we will work on the _on_ping_updated() function. First we need a place in the HUD to tell the measured latency of the local client towards the server. For that, add a new panel into the HUD node and name it PanelServerInfo. You can position it wherever you desire. For reference I placed it roughly in the top-center region of the HUD. As with the PanelPlayerList, change the Self Modulate property to make it transparent.

Now, inside this panel add a label and name it lblServerName. Its purpose should be pretty obvious, huh? Then add another label into this panel and name it lblPing.

We can now work on the _on_ping_updated() function. In its body we have to check the ID of the client that got the updated value. If it corresponds to the local player then we have to update the lblPing control that we have just added. Otherwise, we locate the ui_player_list_entry instance using the dictionary and call the set_latency() function:

game_world.gd
func _on_ping_updated(peer_id, value):
	if (peer_id == gamestate.player_info.net_id):
		# Updating the ping for local machine
		$HUD/PanelServerInfo/lblPing.text = "Ping: " + str(int(value))
	else:
		# Updating the ping for someone else in the game
		if (player_row.has(peer_id)):
			player_row[peer_id].set_latency(value)

Running the test now should give the measured ping for the connected players. There are a few things, however:

  1. We create a label to show the name of the server but didn't use it.
  2. From the server, it doesn't make sense to have a "local ping".
  3. The shown value is constantly 0.

Well, we will fix 1 and 2 promptly! First we need to actually give the information to the connected client. For this we need a new remote function meant to receive the server information and assign into the local server_info dictionary. In it we only assign if not in the server:

network.gd
remote func get_server_info(sinfo):
	if (!get_tree().is_network_server()):
		server_info = sinfo

Once the player gets into the server we have to call this function:

network.gd
func _on_player_connected(id):
	if (get_tree().is_network_server()):
		# Send the server info to the player
		rpc_id(id, "get_server_info", server_info)
		
		... # Previous code

Next, when the game world is loaded we check if we are in the server. In that case we just disable the server info panel because it doesn't make any sense to be there:

game_world.gd
func _ready():
	... # Previous code
	
	# Update the lblLocalPlayer label widget to display the local player name
	$HUD/PanelPlayerList/lblLocalPlayer.text = gamestate.player_info.name
	
	# Hide the server info panel if on the server - it doesn't make any sense anyway
	if (get_tree().is_network_server()):
		$HUD/PanelServerInfo.hide()
   
   ... # Previous code

Why not stuff the server name setup within an else: statement? Well, chances are big we get into the _ready() function before receiving the remote call from the server providing the necessary information. So, what we will do is "hijack" the player list changed event and just update the server name from there. Ideally we should create a signal and connect a function into it but for simplicity sake I will just go with this.. err, "shortcut":

game_world.gd
func _on_player_list_changed():
	# Update the server name
	$HUD/PanelServerInfo/lblServerName.text = "Server: " + network.server_info.name

   ... # Previous code

Latency Simulator

Now, how about the latency values being constantly 0. Well, that's the case when testing the network with multiple instances running on the same machine. We obviously want to test this, so we have to come up with some kind of latency simulation. We will do so now. But, how?

Well, we will use coroutines for that. In other words, the yield() function. More specifically we will use yield() with a timer. Ok, perhaps I will have to go back a little bit here because this can be somewhat confusing for some people. If you already know how to use this function, you will be able to skip a few paragraphs, otherwise keep reading.

Let's take part of the description of this function from the Godot's documentation:

Stops the function execution and returns the current suspended state to the calling function.

To better explain this, suppose function Alice() calls function Bob(). Then within the Bob() function there is a yield(). At that moment the execution of Bob() will be paused and the execution will return to Alice() at the point where Bob() was called. But the state of the Bob() function will be returned as an object that can be used to resume the execution from the point where yield() was called. Let's see this as an snippet of code:

func Alice():
	print("Do Alice stuff 1")
	var bob_state = Bob()
	print("Do Alice stuff 2")
	bob_state.resume()

func Bob():
	print("Do Bob stuff 1")
	yield()
	print("Do Bob stuff 2")
  1. Alice() does some stuff then calls Bob()
  2. Bob() does some stuff then yields
  3. Alice() does some more stuff then asks Bob() to resume
  4. Bob() does some more stuff.

This results in the following printed text:

Do Alice stuff 1
Do Bob stuff 1
Do Alice stuff 2
Do Bob stuff 2

Now, the idea here is to perform some delay before dealing with the network packets but without halting the rest of the simulation. Now there is one very interesting way to use the yield(). If we provide an object and a signal, once that one is emitted the function will resume. That brings the idea of creating a timer object and using its timeout signal to resume the function from the yield() call.

But then, where should we delay the execution? Well, at the entry point of every single remote function. Yes, I known, that's rather unfortunate. To make matters worse we can't create a function like delay_execution() and place the yield() within it. Why? Because if we do so, that function execution will be paused and we resume execution directly in the remote function. Luckily Godot provides a function to create temporary one-shot timers that is perfect for our case! In fact, in the function's description within the documentation we even have an example snippet, which is precisely this:

func some_func():
	print("start")
	yield(get_tree().create_timer(1.0), "timeout")
	print("end")

The really nice thing here is the fact that the created timer will be automatically deleted when the timeout signal is emitted, meaning that we don't need to deal with this cleanup!

Ok then, let's begin working on the code. First we want a variable that will hold the amount of desired simulated latency. We will call it fake_latency. Whenever this value is bigger than 0 then we will make the delay.

network.gd
# If bigger than 0, specifies the amount of simulated latency, in milliseconds
var fake_latency = 0

Now comes the frustrating task. At the beginning of every single remote function we have to add some code. First verify if the fake_latency value is bigger than 0. If so, we call yield() providing the timer created by the create_timer() function, converting the fake latency that is in milliseconds to seconds, which is the unit expected by the timer function. As an example the register_player() function now begins like this:

network.gd
remote func register_player(pinfo):
	if (fake_latency > 0):
		yield(get_tree().create_timer(fake_latency / 1000), "timeout")
	
	if (get_tree().is_network_server()):
   ... # previous code

In the case of a remote function running outside of the network.gd, the code looks like this:

player.gd
remote func server_get_player_input(input):
	if (network.fake_latency > 0):
		yield(get_tree().create_timer(network.fake_latency / 1000), "timeout")
		
	... # previous code

Just to make things a little bit easier to check, all of the remote functions we have declared in the network.gd file:

The same kind of function but in the game_world.gd file:

As for the player.gd file:

Manipulating the Latency Simulation

Now that we have added some way to fake latency, we need means to change the fake_latency variable so we can test things while the game is running. For that we will add some new things into the HUD. In the game world scene add a new panel into the HUD node, naming it PanelFakeLatency. As with the other panels, change the Self Modulate property to be transparent.

Add a label into the PanelFakeLatency and change its Text property to Simulate latency. Bellow it add one SpinBox and name it txtFakeLatency. Its default range ([0..100]) is not enough for us, so change the Max Value property to something like 2000 (2 seconds). Then we connect a function to the value_changed signal, which will be used to manipulate the network.fake_latency variable:

game_world.gd
func _on_txtFakeLatency_value_changed(value):
	network.fake_latency = value

If you try to test the game now you will notice that once the spin box gets focus, we can't remove it, even by clicking outside of it. The fix is a... err.. hack. First add a button into the HUD node, name it btClearFocus and change its layout to Full Rect so it covers the entire screen. Next, in the scene hierarchy, drag it to be the first child of the HUD so it will be rendered first and any control above it will have a chance to get input events. Then expand the Visibility category and change the Self Modulate category to a transparent color, so the button becomes invisible (it's still clickable though). Then connect a new function into its pressed signal containing this a line of code meant to release focus from itself.

game_world.gd
func _on_btClearFocus_pressed():
	$HUD/btClearFocus.release_focus()

Once the button is clicked, it will get focus and from its pressed event we clear it.


When you test the game now and enter a fake latency value, you will probably notice the fact that the measured latency does not match the entered value. To be honest I'm not exactly sure where the inaccuracy comes from. But the important thing is that we can simulate a latency and see how badly that affects the game experience. As an example, in a client if you change the fake latency to 50 then not only the response to inputs will noticeable behind the key events, we still have the choppiness in the animation.

In the next part we will work on some techniques that we can incorporate in order to try to mitigate that lag effect while also maintaining the server in control of the game simulation.

Introduction
Previous1
2
34Next