Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in

Issue 3731002: HomaTransport

Can't Edit
Can't Publish+Mail
Start Review
5 years, 1 month ago by yilongl
5 years ago
john.ousterhout, ouster



Patch Set 1 #

Total comments: 68
Unified diffs Side-by-side diffs Delta from patch set Stats (+709 lines, -210 lines) Patch
M GNUmakefile View 1 chunk +4 lines, -0 lines 0 comments Download
M scripts/cluster.py View 2 chunks +2 lines, -0 lines 0 comments Download
M src/BasicTransport.h View 17 chunks +174 lines, -36 lines 19 comments Download
M src/BasicTransport.cc View 45 chunks +397 lines, -131 lines 37 comments Download
M src/DpdkDriver.h View 3 chunks +4 lines, -10 lines 1 comment Download
M src/DpdkDriver.cc View 6 chunks +38 lines, -6 lines 2 comments Download
M src/Driver.h View 4 chunks +25 lines, -6 lines 3 comments Download
M src/InfUdDriver.h View 1 chunk +3 lines, -2 lines 0 comments Download
M src/InfUdDriver.cc View 3 chunks +6 lines, -4 lines 1 comment Download
M src/Makefrag View 2 chunks +2 lines, -1 line 0 comments Download
M src/MakefragClient View 1 chunk +1 line, -0 lines 0 comments Download
M src/MockDriver.h View 1 chunk +2 lines, -1 line 0 comments Download
M src/MockDriver.cc View 1 chunk +2 lines, -1 line 0 comments Download
M src/NetUtil.h View 2 chunks +9 lines, -2 lines 3 comments Download
M src/SolarFlareDriver.h View 1 chunk +5 lines, -4 lines 1 comment Download
M src/SolarFlareDriver.cc View 1 chunk +5 lines, -4 lines 1 comment Download
M src/TransportManager.cc View 3 chunks +26 lines, -0 lines 0 comments Download
M src/UdpDriver.h View 1 chunk +2 lines, -1 line 0 comments Download
M src/UdpDriver.cc View 1 chunk +2 lines, -1 line 0 comments Download


Total messages: 2
First implementation of the HomaTransport. I haven't been able to run it on CloudLab (working ...
5 years, 1 month ago (2016-12-05 23:23:06 UTC) #1
5 years, 1 month ago (2016-12-20 05:22:15 UTC) #2
The small changes you made look pretty good, but the big changes (the new code
at the end of HomaTransport.cc) don't look so good. It looks to me like you've
allowed a lot of complexity to creep into that code.

File src/BasicTransport.cc (right):

src/BasicTransport.cc:93: , maxGrantedMessages(3)
Initialize to 0 here, or to "()", then provide the default value 3 before line
105. That way, all of initialization for maxGrantedMessages is in one place.

src/BasicTransport.cc:111: }
I don't understand why you need to DIE here. For example, you don't need to use
all of the available packet priorities. Let's discuss.

src/BasicTransport.cc:114: int numUnschedPrio =
driver->getHighestPacketPriority() -
Pull the initialization of lowestUnschedPriority down here, so all of the
priority calculations are in one place.

src/BasicTransport.cc:118: unschedTrafficPrioBrackets[i] =
unschedTrafficPrioBrackets[i-1] << 1;
Why this particular policy? Seems like this could waste priority levels. Let's

src/BasicTransport.cc:127: Cycles::toSeconds(timerInterval)*1e3);
Include in this log message information about the priority allocation.

src/BasicTransport.cc:171: */
It doesn't seem to me that this message needs to be public.

src/BasicTransport.cc:178: return
downCast<uint8_t>(driver->getHighestPacketPriority() - i);
I don't think it should be necessary to call this method every time you're
computing a packet priority. Also, this won't be correct if we're not using all
of the available priorities.

Hmmm, you actually invoke the method twice! See if you can avoid calling it at
all: this method is in the inner loop, where every ns counts.

src/BasicTransport.cc:338: *      Priority used to send the packets.
Use names consistently. Either call it "priority" everywhere (my personal
preference) or "prio".

src/BasicTransport.cc:811: }
This code assumes that the sender will always send exactly roundTripBytes of
unscheduled data, which creates an undocumented dependency. I think it would be
safer and more general if the DATA packet header included an explicit indication
of how many unscheduled bytes will be sent for this message. For example, we
will eventually adjust the amount of unscheduled bytes dynamically based on
server load (if the server's link is overloaded, there's no need to send a full
round trip worth of unscheduled data).

src/BasicTransport.cc:815: if (clientRpc->accumulator->packetLossIndicator > 2)
There is an undocumented heuristic in this line; let's discuss.

src/BasicTransport.cc:831: // TODO: why would the sender put NEED_GRANT into the
data packet? couldn't the receiver figure it out?
Let's discuss this. I believe that NEED_GRANT should be replaced with a count of
the number of unscheduled bytes.

src/BasicTransport.cc:913: &ack, NULL, driver->getHighestPacketPriority());
Don't use driver->getHighestPacketPriority() here (or elsewhere, as much as
possible): that creates an implicit dependency where you are assuming that
control packets will be sent with the highest available priority. Instead, store
an explicit value in the transport, such as "controlPacketPriority", which is
computed explicitly and used. This will make things more obvious.

In general, avoid using one thing to represent another: this creates an unstated
assumption, which can lead to confusion later.

src/BasicTransport.cc:925: // Use the higest priority for retransmission
I think that the sender should determine retransmission priorities, just like it
determines other priorities. Let's discuss.

src/BasicTransport.cc:1025: if (serverRpc->totalLength > roundTripBytes) {
Same comment as for client responses.

src/BasicTransport.cc:1295: if (fragments.find(header->offset) ==
fragments.end()) {
Why did you add this test? Let's discuss.

src/BasicTransport.cc:1337: *      Total size of the packet at payload.
It looks to me like you have broken this comment: I don't see a payload

src/BasicTransport.cc:1546: activeMessages.end()) {
I think this may have a bug where you don't request retransmissions. If the
message consists only of unscheduled bytes, then it won't be on the
activeMessages list, so if packets are lost we won't request retransmissions,
no? I believe there is a similar problem for serverRpc's below.

By the way, this illustrates the risks of using one thing (presence on the
activeMessages list) to represent another thing that isn't quite the same (need
to request retransmissions if packets don't arrive).

src/BasicTransport.cc:1613: * to enter the new message into the scheduler.
Document the parameter.

src/BasicTransport.cc:1618: int rank = 0;
Add a short comment giving the overall idea of what the loop below is doing.

src/BasicTransport.cc:1621: if (*newMessage < m) {
This code feels unnecessarily obscure. Why compare messages, as opposed to
message lengths? Let's discuss.

src/BasicTransport.cc:1633: highestGrantedPrio++;
I don't understand why this code is here. Shouldn't scheduleNewMessage take care
of this, if it is needed? Ditto for line 1654 below and other places. Let's

src/BasicTransport.cc:1650: // The new message should replace the "worst" active
I don't understand this: shouldn't the new message be inserted in a place that
represents its priority relative to the existing active messages? Let's discuss.

src/BasicTransport.cc:1669: * decision and 2) send out a grant for another data
What does "make a change in its scheduling decision" mean? Say a bit more about

src/BasicTransport.cc:1672: *      The message that receives new data.
Has the packet already been processed when this method is called?

BasicTransport::scheduledMessageReceiveData(IncomingMessage* message)
Not a great name; let's discuss alternatives.

src/BasicTransport.cc:1680: // The message has not been fully received.
If the message has been fully received, then I would expect it to be removed
from the active list, so this method would never be invoked. Let's discuss.

src/BasicTransport.cc:1684: activeMessages.iterator_to(*message);
It looks like iterator_to gets called in a bunch of places. Do you know if this
method is expensive? Would it makes sense simply to keep a state variable in the
message indicating whether or not it's active?

src/BasicTransport.cc:1690: scheduleNewMessage(message);
Why are you erasing and rescheduling the message? Let's discuss.

src/BasicTransport.cc:1695: // no other active messages are from the same sender
as it.
I don't understand why receiving a packet on a backup message causes it to get
promoted to active. Let's discuss.

src/BasicTransport.cc:1714: // granted.
If the message was purged, why was this method invoked in the first place? Let's

Also, the method won't necessarily do "nothing" in this case: it will still fall
through to the code below.

src/BasicTransport.cc:1722: int rank = 0;
This variable is important enough (and the method is long and complicated
enough) that it needs to be documented.

src/BasicTransport.cc:1728: ? m->grantOffset : roundTripBytes;
I think you should be able to eliminate this special-case check by initializing
grantOffset properly when the first packet is received. Let's discuss.

src/BasicTransport.cc:1731: // Slow path: a message has been fully granted.
Purge this
This is confusing, because you haven't actually granted the whole message yet.
That happens way below. See if you can create a more logical order to the
method; not only will it make the code more readable, but it's also more likely
to produce correct behavior.

src/BasicTransport.cc:1751: }
It's a red flag to me that this special-case code for sameSender keeps appearing
over and over. Find a way to do it only once.

src/BasicTransport.cc:1776: string fmt;
Make fmt a char*, not string. Otherwise, you risk doing storage allocation when
fmt is allocated, which is a bad idea since this method is on the critical path,
and in the normal case we are going to ignore fmt anyway.

src/BasicTransport.cc:1808: }
This method, and addScheduledMessage, are too complicated: I can't yet
understand how they work, and I can't convince myself that they are correct
(given their complexity, I doubt that they are correct). I'm not even totally
sure I understand what this method is supposed to do; I also don't understand
how grant priorities are computed (it looks like priority may be implicitly
determined by the position of a message in the list, which does not appear to be

Rewrite this code in a way that is significantly simpler. You will need to
figure out how to tease apart the functionality so you don't keep repeating the
same tests, and so the code structure is more obvious.

src/BasicTransport.cc:1812: * ordering within the list.
Always document all arguments.

File src/BasicTransport.h (right):

src/BasicTransport.h:195: // packet arrives that can be added into the buffer.
Let's discuss; I don't understand how this is going to be used.

src/BasicTransport.h:207: class IncomingMessage {
Maybe combine IncomingMessage with MessageAccumulator? Let's discuss.

src/BasicTransport.h:210: enum RpcKind { CLIENT_RPC, SERVER_RPC };
I think we discussed this earlier, but this field, both the name and the values,
aren't precise enough. For example, does CLIENT_RPC mean the message is coming
from the client, or going to the client? Also, I don't understand what "RPC"
means in these names. Need better names and/or more documentation.

For consistency, first list all of the public methods for the class, if any,
then private methods, then instance variables.

src/BasicTransport.h:217: virtual RpcId getRpcId() = 0;
Rather than having a getter method, can't this just be a public instance
variable? Let's discuss.

src/BasicTransport.h:232: /// Used to link this object into t->backupMessages.
I believe we already discussed that "backupMessages" is not a good name; how
about "interactiveMessages" instead?

src/BasicTransport.h:242: /// The (likely) unique identifier for the sender of
this message.
I don't understand what "likely" means; let's discuss. This documentation
probably needs to be more precise.

src/BasicTransport.h:246: /// not received the first packet of the message.
Why would this structure even exist if we have not yet received the first packet
of the message? I would have guessed that this structure wouldn't be used except
for messages that require more than one packet, and for which the first packet
has already been received. Let's discuss.

Also, "totalLength" is too vague of a name. Let's discuss something more

src/BasicTransport.h:260: * Comparison function used by the message scheduler.
Need more documentation: this is not precise enough. Exactly what does the
return value mean?

src/BasicTransport.h:278: class ClientRpc : public IncomingMessage {
Instead of a ClientRpc being an instance of IncomingMessage (which feels
awkward), why not give it a member variable that is Tub<IncomingMessage>? This
way, you would never even initialize or use the IncomingMessage field unless the
response requires more than one packet. Let's discuss.

src/BasicTransport.h:303: uint8_t transmitPriority;
I believe we discussed this field and decided that it isn't necessary.

However, on reading more code, I think it probably is necessary. But, the
documentation isn't really correct. Let's discuss.

src/BasicTransport.h:318: // TODO: HOW IS IT ACTUALLY USED?
Let's discuss.

src/BasicTransport.h:580: // this GRANT.
Documentation not quite right; the priority only refers to untransmitted bytes
up to offset.

src/BasicTransport.h:665: void scheduleNewMessage(IncomingMessage *newMessage);
Preserve the alphabetical ordering of these method declarations.

src/BasicTransport.h:796: /// priority for the entire unscheduled portion of the
This documentation is pretty hard to parse. Maybe give a specific example to

Also, why does entry zero here correspond to the highest priority, whereas
priority zero is the lowest priority? This seems likely to cause confusion.

src/BasicTransport.h:797: uint32_t
Do these priorities apply to *us* when we send, or to other machines  (i.e., who
is "the sender")?

src/BasicTransport.h:806: /// been granted in its entirety, it will be removed
from the list.
Is there any meaning to the ordering in this list, or in backupMessages?

src/BasicTransport.h:811: /// Holds the so-called backup messages that are
awared by the scheduler
"are awared by the scheduler" => "the scheduler is aware of"

Instead of "aware of", it would be more precise to say that we
have received at least one packet for each of these messages,
but haven't yet fully granted them and aren't actively sending
grants to them in order to avoid overloading the network.

src/BasicTransport.h:820: /// [-1, #lowestUnschedPrio). -1 means no message is
being scheduled
This documentation is good.

File src/DpdkDriver.cc (right):

src/DpdkDriver.cc:151: // TODO(YilongL): I don't understand the code here
Let's discuss; it looks like the comment in lines 144-146 is stale (it used to
be possible to specify the MAC address explicitly, but I believe I changed the
code to always get the MAC address from the NIC). Once we figure this out,
please update the comment.

src/DpdkDriver.cc:267: #ifdef ETHERNET_DOT1P
The symbol name ETHERNET_DOT1P is pretty cryptic. How about something with more
obvious meaning, such as "ENABLE_ETHERNET_PRIORITIES"?

File src/DpdkDriver.h (right):

src/DpdkDriver.h:61: virtual uint8_t getHighestPacketPriority();
I recommend making priorities an int value, rather than uint8_t. I don't see any
advantage in narrowing the type, and if in the future we want more than 256
priorities (admittedly unlikely), it won't cause compatibility problems.

File src/Driver.h (right):

src/Driver.h:214: * The highest packet priority level this Driver supports (0 is
"The" => "Returns the"

src/Driver.h:356: *      The priority level of this packet.
Add "0 is lowest priority" to the documentation. Same on line 381.

src/Driver.h:359: const void *header,
It's our coding style to associate the "*" with the type, not the variable, so
it should be "Address* header", not "Address *header".

File src/InfUdDriver.cc (right):

src/InfUdDriver.cc:656: free(bufferMemory);
I don't understand why you made this change; let's discuss.

File src/NetUtil.h (right):

src/NetUtil.h:39: VLAN_TAGGED = 0x8100,   // VLAN-tagged frame (IEEE 802.1Q)
Need a bit more documentation here, so that people can understand this code
without reading the 802.1Q standard. For example:

"Special payload "type" that indicates that the packet uses
an expanded header containing VLAN and priority information.
The actual payload type will be in the tpid field."

Also, VLAN_TAGGED is a pretty cryptic name; let's discuss alternatives.

src/NetUtil.h:61: uint16_t tpid;          // Tag protocol identifier.
The name and documentation for this field are cryptic. Let's discuss how to
improve them.

src/NetUtil.h:63: // DEI (1 bit) and VID (12 bits).
Documentation needs to be more precise so it's obvious exactly which bits are
which. Also, let's discuss a better name for this field.

File src/SolarFlareDriver.cc (right):

src/SolarFlareDriver.cc:536: const void *header,
Revert the *changes.

File src/SolarFlareDriver.h (right):

src/SolarFlareDriver.h:58: const void *header,
Move the asterisks back again (see note in other file).
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld aab5469