RPC Re­dux

Thank you for com­ing here to­day, to the first con­fer­ence ded­i­cated to Fi­na­gle This is the writ­ten ver­sion of a talk I gave at Fi­na­gle­Con, Aug 13, 2015 in San Fran­cisco, CA. . Thanks to Travis, Chris, and the other or­ga­niz­ers. Thank you to all the con­trib­u­tors, early adopters, and other sup­port­ers who have helped along the way.

We’ve been us­ing Fi­na­gle in pro­duc­tion at Twit­ter for over 4 years now While the first com­mit was on Oct 18, 2010, it was first used in pro­duc­tion early the fol­low­ing year. In­ter­est­ingly, the first pro­duc­tion ser­vice us­ing Fi­na­gle was Twit­ter’s web crawler, writ­ten in Java. This was done by Raghaven­dra Prabhu who took the model to heart: the Java code was writ­ten in a very func­tional style, flatMaps abound. . We’ve learned a lot, and have had a chance to ex­plore the de­sign space thor­oughly.

While Fi­na­gle is at the core of nearly every ser­vice at Twit­ter, the open source com­mu­nity has ex­tended our reach much fur­ther than we could have imag­ined. I think this con­fer­ence is a tes­ta­ment to that. I’m look­ing for­ward to the talks here to­day, and I’m cu­ri­ous to see how Fi­na­gle is used else­where. It’s ex­hil­a­rat­ing to see it be­ing used in ways be­yond our own imag­i­na­tions.

There have been many changes to Fi­na­gle along the way, but the core pro­gram­ming model re­mains un­changed, namely the use of: Fu­tures to struc­ture con­cur­rent code in a safe and com­pos­able man­ner; Ser­vices to de­fine re­motely in­vo­ca­ble ap­pli­ca­tion com­po­nents; and of Fil­ters to im­ple­ment ser­vice-ag­nos­tic be­hav­ior. This is de­tailed in Your Server as a Func­tion This pa­per was pub­lished at PLOS′13; the slides from the work­shop talk are also avail­able. .” I en­cour­age you all to read it.

I be­lieve the suc­cess of this model stems from three things: sim­plic­ity, com­pos­abil­ity, and good sep­a­ra­tion of con­cerns.

First, each ab­strac­tion is re­duced to its essence. A Fu­ture is just about the sim­plest way to un­der­stand a con­cur­rent com­pu­ta­tion. A Ser­vice is not much more than a func­tion re­turn­ing a Fu­ture. A Fil­ter is es­sen­tially a way of dec­o­rat­ing a Ser­vice. This makes the model easy to un­der­stand and in­tuit. It’s al­most al­ways ob­vi­ous what every­thing is for, and where some new func­tion­al­ity should be­long.

Sec­ond, the data struc­tures com­pose. Fu­tures are com­bined with oth­ers, ei­ther se­ri­ally or con­cur­rently, to form more com­plex de­pen­dency graphs. Ser­vices and Fil­ters com­bine as func­tions do: Fil­ters may be stacked, and even­tu­ally com­bined with a Ser­vice; Ser­vices are com­bined into a pipeline. Com­po­si­tion­al­ity lets us to build com­pli­cated struc­tures from sim­pler parts. It means that we can write, rea­son about, and test each part in iso­la­tion, and then put them to­gether as re­quired.

Fi­nally, the ab­strac­tions pro­mote good sep­a­ra­tion of con­cerns. Con­cur­rency con­cerns are sep­a­rated from how ap­pli­ca­tion com­po­nents are de­fined. Be­hav­ior (fil­ters) is sep­a­rated from do­main logic (ser­vices). This is more than skin deep: Fu­tures en­cour­age the pro­gram­mer to struc­ture her ap­pli­ca­tion as a set of com­pu­ta­tions that are re­lated by data de­pen­den­cies–a data flow. This di­vorces the se­man­tics of a com­pu­ta­tion, which are de­fined by the pro­gram­mer, from its ex­e­cu­tion de­tails, which are han­dled by Fi­na­gle.

In some sense, and cer­tainly in ret­ro­spect, this de­sign–Fu­tures, Ser­vices, Fil­ters–feels in­evitable. I don’t think it’s pos­si­ble to have any of these prop­er­ties in iso­la­tion. They are a pack­age deal. It’s dif­fi­cult to imag­ine good com­po­si­tion­al­ity with­out also hav­ing sep­a­ra­tion of con­cerns. Com­po­si­tion is quickly de­feated by con­tend­ing needs. For ex­am­ple, we can read­ily com­pose Ser­vices and Fil­ters be­cause Fu­tures al­low them to act like reg­u­lar func­tions. If ex­e­cu­tion se­man­tics were tied into the defini­ton of a Ser­vice, it’s dif­fi­cult to imag­ine sim­ple com­po­si­tion.

Where RPC breaks down

The sim­plic­ity and clar­ity of Your Server As a Func­tion be­lies the true na­ture of writ­ing and op­er­at­ing a dis­trib­uted RPC-based sys­tem. Us­ing Fi­na­gle in prac­tice re­quires much of de­vel­op­ers and op­er­a­tors both–most se­ri­ous de­ploy­ment of Fi­na­gle re­quires users to be­come ex­pert in its in­ter­nals: Fi­na­gle must be con­fig­ured, sta­tis­tics must be mon­i­tored, and ap­pli­ca­tions must be pro­filed so that we stand a chance of un­der­stand­ing it when it mis­be­haves.

Let’s con­sider two il­lus­tra­tive ex­am­ples where this sim­plic­ity break­ing down: cre­at­ing re­silient sys­tems, and cre­at­ing flex­i­ble sys­tems.

Ex­am­ple 1, re­silient sys­tems

This qual­ity of be­ing able to store strain en­ergy and de­flect elas­ti­cally un­der a load with­out break­ing is called re­silience’, and it is a very valu­able char­ac­ter­is­tic in a struc­ture. Re­silience may be de­fined as the amount of strain en­ergy which can be stored in a struc­ture with­out caus­ing per­ma­nent dam­age to it’.

A re­silient soft­ware sys­tem is one which can with­stand strain in the form of fail­ure, over­load, and vary­ing op­er­at­ing con­di­tions.

Re­silience is an im­per­a­tive in 2015: our soft­ware runs on the truly dis­mal com­put­ers we call dat­a­cen­ters. Be­sides be­ing heinously com­plex–and we all know com­plex­ity is the en­emy of ro­bust­ness–they are un­re­li­able and prone to op­er­a­tor er­ror. Dat­a­cen­ters are, how­ever, the only com­put­ers we know to build cheaply and at scale. We have no choice but to build into our soft­ware the re­siliency for­feited to our hard­ware.

That I called it a re­silient sys­tem is no mis­take. Of the prob­lems we en­counter on a daily ba­sis, re­siliency is per­haps the most de­mand­ing of our ca­pac­ity for sys­tems think­ing. An in­di­vid­ual process, server, or even a dat­a­cen­ter is not by it­self re­silient. The sys­tem must be.

Thus it’s no sur­prise that re­silient sys­tems are cre­ated by adopt­ing a re­siliency mind­set. It’s not suf­fi­cient to im­ple­ment re­silient be­hav­ior at a sin­gle point in an ap­pli­ca­tion; it must be every­where con­sid­ered.

Fi­na­gle, oc­cu­py­ing the bound­ary be­tween in­de­pen­dently fail­ing servers, sup­plies many im­por­tant tools for cre­at­ing re­silient sys­tems. Among these are: time­outs, fail­ure de­tec­tion, load bal­anc­ing, queue lim­its, re­quest re­jec­tion, and re­tries. We piece these to­gether just so to cre­ate re­silient and well-be­haved soft­ware. (Of course, this cov­ers only some as­pects of re­silience. Many tools and strate­gies are not avail­able to Fi­na­gle be­cause it is does not have ac­cess to ap­pli­ca­tion se­man­tics. For ex­am­ple, Fi­na­gle can­not tell whether a cer­tain com­pu­ta­tion is se­man­ti­cally op­tional from an ap­pli­ca­tion’s point of view.)

Here’s an in­com­plete pic­ture of what it’s like to cre­ate a re­silient server with Fi­na­gle to­day.

Time­outs. You use server time­outs to limit the amount of time spent on any in­di­vid­ual re­quest. Client time­outs are used max­i­mize your chances of suc­cess while not get­ting bogged down by slow or un­re­spon­sive servers. At the same time, you want to leave some room for re­quests to be re­tried when they can. These time­outs also have to con­sider ap­pli­ca­tion re­quire­ments. (Per­haps a re­quest is not use­fully served if the user is un­will­ing to wait for its re­sponse.)

Re­tries. You tune your re­tries so that you have a chance of retry­ing failed or timed out op­er­a­tions (when this can be done safely). At the same time, you want to make sure that re­tries will not ex­haust your server’s time­out bud­get, and that you won’t am­plify fail­ures and over­whelm down­stream sys­tems that are al­ready fal­ter­ing.

Queue sizes, con­nec­tion pools. You want to tune in­bound queue sizes so that the server can­not be bogged down by han­dling too many re­quests at the same time. Per­haps you use Iago to run what you be­lieve are rep­re­sen­ta­tive load tests, so that you un­der­stand your server’s load re­sponse and tune queue sizes ac­cord­ingly. Sim­i­larly, you con­fig­ure your clien­t’s con­nec­tion pool sizes to bal­ance re­source us­age, load dis­tri­b­u­tion, and con­cur­rency. You also ac­count for your server’s fan out fac­tor, so that you won’t over­load your down­stream servers.

Fail­ure han­dling. You need to fig­ure out how to re­spond to which fail­ures. You may choose to retry some (with a fixed max num­ber of re­tries), you might make use of al­ter­nate back­ends for other; still oth­ers might war­rant down­graded re­sults.

Many re­mark­ably re­silient servers are con­structed this way. It’s a lot of work, and a lot of up­keep: the op­er­at­ing en­vi­ron­ment, which in­cludes up­stream clients and down­stream servers, changes all the time. You have to re­main vig­i­lant in­deed so that your sys­tem does not regress. Luck­ily we have many tools to make sure our servers don’t regress, but even these can test only so much. The bur­den on de­vel­op­ers and op­er­a­tors re­mains con­sid­er­able.

It’s note­wor­thy that a great num­ber of these pa­ra­me­ters are in­ter­twined in in­ex­tri­ca­ble ways. If a server’s re­quest time­outs aren’t har­mo­nized with its clients re­quest time­outs, their pool sizes, and con­cur­rency lev­els, you’re un­likely to ar­rive at your de­sired be­hav­ior. When these pa­ra­me­ters fall out of lock­step–and this is ex­ceed­ingly easy to do!–you may com­pro­mise the re­siliency of your sys­tem. In­deed a good num­ber of is­sues we have had in pro­duc­tion are re­lated to this very is­sue.

More later.

Ex­am­ple 2, flex­i­ble sys­tems

Con­sider a mod­ern Ser­vice Ori­ented Ar­chi­tec­ture. This is Twit­ter’s Note that this is highly sim­pli­fied, show­ing only the gen­eral out­line and lay­er­ing of the Twit­ter’s ar­chi­tec­ture. .

Many things that are quite sim­ple in a mono­lithic ap­pli­ca­tion be­come quite com­pli­cated when its mod­ules are dis­trib­uted as above.

Con­sider pro­duc­tion test­ing a new ver­sion of a ser­vice. In a mono­lithic ap­pli­ca­tion, this is a sim­ple task: you might check out the branch you’re work­ing on, com­pile the ap­pli­ca­tion, and test it whole­sale. In or­der to de­ploy that change, you might de­ploy that bi­nary on a small num­ber of ma­chines, make sure they look good, and then roll it out to a wider set of ma­chines.

In a ser­vice ar­chi­tec­ture, it is much more com­pli­cated. You could de­ploy a sin­gle in­stance of the ser­vice you have changed, but then you’d need to co­or­di­nate with every up­stream layer in or­der to redi­rect re­quests to your test ser­vice.

It gets worse when changes need to be co­or­di­nated across mod­ules, in­creas­ing again the bur­den of co­or­di­na­tion. For ex­am­ple, your change might re­quire an up­stream ser­vice to change, per­haps be­cause re­sults need to be in­ter­preted dif­fer­ently.

What about in­ter-mod­ule de­bug­ging? Again, in mono­lithic sys­tems, our nor­mal toolset ap­plies read­ily: de­bug­gers, printf, log­ging, Dtrace, and heap­ster help greatly.

We also have prob­lems that do not have ready analogs in mono­lithic ap­pli­ca­tions. For ex­am­ple, how do we bal­ance traf­fic across al­ter­na­tive de­ploy­ments of a ser­vice?

De­com­pos­ing RPC

We’ve been im­plic­itly op­er­at­ing with a very sim­ple model of RPC, namely: the role of an RPC sys­tem is to link up two ser­vices so that they a client may dis­patch re­quests to a server. Put an­other way, the role of Fi­na­gle is to ex­port ser­vice im­ple­men­ta­tions from a server, and to im­port ser­vice im­ple­men­ta­tions from re­mote servers.

In prac­tice, Fi­na­gle does much more. Each Ser­vice is re­ally a replica set, and this re­quires load bal­anc­ing, con­nec­tion pool­ing, fail­ure de­tec­tion, time­outs, and re­tries in or­der to work prop­erly. And of course Fi­na­gle also im­ple­ments ser­vice sta­tis­tics, trac­ing, and other sine qua nons. But this is only to ful­fill the above, ba­sic model: to ex­port and to im­port ser­vices.

In many ways, this is the least we can get away with, but, as we have seen, ser­vice ar­chi­tec­tures in prac­tice end up be­ing quite com­pli­cated: they are com­pli­cated to im­ple­ment; they are un­wieldy to de­velop for and to op­er­ate. The RPC model does­n’t pro­vide di­rect so­lu­tions to these kinds of prob­lems.

I think this calls for think­ing about how we might el­e­vate the RPC model to ac­count for this re­al­ity.

A ser­vice model

Why should we solve these prob­lems in the RPC layer?

A com­mon ap­proach is to in­tro­duce in­di­rec­tion at the in­fra­struc­ture level. You might use soft­ware load bal­ancers to redi­rect traf­fic where ap­pro­pri­ate, or to act as mid­dle­ware to han­dle re­siliency con­cerns. You might de­ploy spe­cial­ized stag­ing fron­tends, so that new ser­vices can be de­ployed in the con­text of an ex­ist­ing pro­duc­tion sys­tem.

This ap­proach is prob­lem­atic for sev­eral rea­sons.

—They are al­most al­ways nar­row in scope, and dif­fi­cult to gen­er­al­ize. For ex­am­ple, a stag­ing setup might work well for fron­tends that are di­rectly be­hind a web server, but what if you want to stage deeper down the ser­vice stack?

—They also tend to op­er­ate be­low the ses­sion layer, and don’t have ac­cess to use­ful ap­pli­ca­tion level data such as time­outs, or whether an op­er­a­tion is idem­po­tent.

—They con­sti­tute more in­fra­struc­ture. More com­plex­ity. More com­po­nents that can fail.

Fur­ther, as we have al­ready seen, many RPC con­cerns are in­ex­tri­ca­bly en­twined with those of re­siliency and flex­i­bil­ity. It seems pru­dent to con­sider them to­gether.

I think what we need is a co­her­ent model that ac­counts for re­siliency and flex­i­bil­ity. This will give us the abil­ity to cre­ate com­mon in­fra­struc­ture that can be com­bined freely to ac­com­plish our goals. A good model will be sim­ple, give back free­dom to the im­ple­men­tors, dis­en­tan­gle ap­pli­ca­tion logic from be­hav­ior from con­fig­u­ra­tion.

I think it’s use­ful to break our ser­vice model down into three build­ing blocks: a com­po­nent model, a linker, and an in­ter­ac­tion model.

A com­po­nent model de­fines how sys­tems com­po­nents are de­fined and struc­tured. A good com­po­nent model is gen­eral–we can­not pre­sup­pose what users want to write–but must also af­ford the un­der­ly­ing im­ple­men­ta­tion a great deal of free­dom. Fur­ther, a com­po­nent model should not sig­nif­i­cantly limit how com­po­nents are struc­tured, how they are pieced to­gether; it should ac­count for both ap­pli­ca­tions as well as be­hav­ior.

A linker is re­spon­si­ble for piec­ing com­po­nents to­gether, for dis­trib­ut­ing traf­fic. It should do so in a way that al­lows flex­i­bil­ity.

An in­ter­ac­tion model tells how com­po­nents com­mu­ni­cate with each other. It gov­erns how fail­ures are in­ter­preted, how to safely retry op­er­a­tions, and how to de­grade ser­vice when needed. A good in­ter­ac­tion model is es­sen­tial for cre­at­ing truly a re­silient sys­tem.

You may have no­ticed that these play the part of a mod­ule sys­tem (com­po­nent model, linker) and a run­time (in­ter­ac­tion model).

We’ve been busy mov­ing Fi­na­gle to­wards this model, to solve the very prob­lems out­lined here. Some of it has al­ready been im­ple­mented, and is even in pro­duc­tion; much of it is on­go­ing work.

Com­po­nent model

Fi­na­gle al­ready has a good, proven com­po­nent model: Fu­tures, Ser­vices, and Fil­ters.

While there is­n’t much that I would change here, though one com­mon is­sue is that it lacks a uni­form way of spec­i­fy­ing and in­tepret­ing fail­ures.

By way of Fu­tures, Ser­vices can re­turn any Throw­able as an er­ror. This open uni­verse” ap­proach to er­ror han­dling is prob­lem­atic be­cause we can­not as­sign se­man­tics to er­rors in a stan­dard way.

We re­cently in­tro­duced Fail­ure as a stan­dard er­ror value in Fi­na­gle.

final class Failure private[finagle](
  private[finagle] val why: String,
  val cause: Option[Throwable] = None,
  val flags: Long = 0L,
  protected val sources: Map[Failure.Source.Value, Object] = Map.empty,
  val stacktrace: Array[StackTraceElement] = Failure.NoStacktrace,
  val logLevel: Level = Level.WARNING
) extends Exception(why, cause.getOrElse(null))

Fail­ures are flagged. Flags rep­re­sent se­man­ti­cally crisp and ac­tion­able prop­er­ties of the fail­ure, such as whether it is restartable, or if it re­sulted from an op­er­a­tion be­ing in­ter­rupted. With flags, server code can ex­press the kind of a fail­ure, while client code can in­ter­pret fail­ures in a uni­form way, for ex­am­ple to de­ter­mine whether a par­tic­u­lar fail­ure is retryable.

Fail­ures also in­clude sourc­ing in­for­ma­tion, so that we have a de­fin­i­tive way to tell where a fail­ure orig­i­nated, re­motely or lo­cally.

Linker

Fi­na­gle clients are usu­ally cre­ated by pro­vid­ing a sta­tic host list or a dy­namic one us­ing Ser­ver­Set A Ser­ver­Set is a dy­namic host list stored in ZooKeeper. It is up­dated when­ever nodes reg­is­ter, dereg­is­ter, or fail to main­tain a ZooKeeper heart­beat. .

While Ser­ver­Sets are use­ful–they al­low servers to come and go with­out caus­ing op­er­a­tional hic­cups–they are also lim­ited in their power and flex­i­bil­ity.

We have worked to in­tro­duce a sym­bolic nam­ing sys­tem into Fi­na­gle. This amounts to a kind of flex­i­ble RPC rout­ing sys­tem. It uses sym­bolic ad­dresses as des­ti­na­tions, and al­lows the ap­pli­ca­tion to al­ter its in­ter­pre­ta­tion as ap­pro­pri­ate. We call it Wily.

Wily is an RPC rout­ing sys­tem that works by dis­patch­ing pay­loads to hi­er­ar­chi­cally-named log­i­cal des­ti­na­tions. Wily uses name­spaces to af­fect rout­ing on a per-trans­ac­tion ba­sis, al­low­ing ap­pli­ca­tions to mod­ify be­hav­ior by chang­ing the in­ter­pre­ta­tion of names with­out al­ter­ing their syn­tax.

—Wily de­sign doc­u­ment

To ex­plain how it works, let’s start with how the var­i­ous con­cepts in­volved are rep­re­sented in Fi­na­gle.

Fi­na­gle is care­ful to de­fine in­ter­faces around link­ing, sep­a­rat­ing the sym­bolic, or ab­stract, no­tion of a name from how it is in­ter­preted.

In Fi­na­gle, ad­dresses spec­ify phys­i­cal, or con­crete, des­ti­na­tions. Ad­dresses, by way of Sock­e­tAd­dress, tell how to con­nect to other com­po­nents. Since Fi­na­gle is in­her­ently replica-ori­ented”, where every des­ti­na­tion is a set of func­tion­ally iden­ti­cal repli­cas of a com­po­nent, a fully bound ad­dress is rep­re­sented by a set of Sock­e­tAd­dress.

sealed trait Addr
case class Bound(addrs: Set[SocketAddress]) extends Addr
case class Failed(cause: Throwable) extends Addr
object Pending extends Addr
object Neg extends Addr

Ad­dresses can oc­cupy other states as well, to ac­count for fail­ure (Failed), pend­ing res­o­lu­tion (Pend­ing), or nonex­is­tence (Neg).

Whereas phys­i­cal ad­dresses rep­re­sent con­crete end­points (a set of IP ad­dresses), a name is sym­bolic, and may be purely ab­stract. To ac­count for this, Name has two vari­ants,

sealed trait Name
case class Path(path: finagle.Path) extends Name
case class Bound(addr: Var[Addr]) extends Name

The first, Path, de­scribes an ab­stract name via a path. Paths are rather like filesys­tem paths: they de­scribe a path in a hi­er­ar­chy. For ex­am­ple, the path /s/​user/​main might name the main user ser­vice. Note that paths are ab­stract: the name /s/​user/​main names only a log­i­cal en­tity, and gives the im­ple­men­ta­tion full free­dom to in­ter­pret it. More on this later.

The sec­ond vari­ant of Name rep­re­sents a des­ti­na­tion that is bound. That is, an ad­dress is avail­able. Note here that dis­tri­b­u­tion rears its ugly head: any nam­ing mech­a­nism must ac­count for vari­abil­ity. The ad­dress (replica set) is go­ing to change with time, and so it is not suf­fi­cient to work with a sin­gle, sta­tic Addr. Rather, we make use of a Var Vars pro­vide a way to do self ad­just­ing com­pu­ta­tion”, mak­ing it sim­ple com­pose change­able en­ti­ties. to model a vari­able, or change­able, value. Thus any­thing that deals with names must ac­count for vari­abil­ity. For ex­am­ple, Fi­na­gle’s load bal­ancer is dy­namic, and is able to add and re­move end­points from con­sid­er­a­tion as a bound name changes.

Users of Fi­na­gle rarely en­counter Name di­rectly. Rather, Fi­na­gle will parse strings into names, through its re­solver mech­a­nism.

With the help of the re­solver we write:

// A static host set.
Http.newClient("host1:8080,host2:8080,host3:8080")

// A (dynamic) endpoint set derived from the
// ServerSet at the given ZK path.
Http.newClient("zk!myzkhost:2181!/my/zk/path")

// A purely symbolic name.
// It must be bound by an interpreter.
Http.newClient("/s/web/frontend")

And of these, only the last re­sults in a Name.Path.

In­ter­pre­ta­tion

In­ter­pre­ta­tion is the process of turn­ing a Path into a Bound. Fi­na­gle uses a set of rewrit­ing rules to do this. These rules are called del­e­ga­tions, and are writ­ten in a dtab, short for del­e­ga­tion table. Del­e­ga­tions are writ­ten as

src => dest

for ex­am­ple

/s    =>    /zk/zk.local.twitter.com:2181

A path matches a rule if its pre­fix matches the source; rules are ap­plied by re­plac­ing that pre­fix with the des­ti­na­tion.

In the above ex­am­ple, the path /s/​user/​main would be rewrit­ten to /zk/​zk.lo­cal.twit­ter.com:2181/​user/​main.

Rewrit­ing is­n’t by it­self very in­ter­est­ing; we must be able to ter­mi­nate! And re­mem­ber our goal is to turn a path into a bound. For this, we have a spe­cial name­space of paths, /$. Paths that be­gin with $ ac­cess im­ple­men­ta­tions that are al­lowed to in­ter­pret names. These are called Namers. For ex­am­ple, to ac­cess the ser­ver­set namer, we make use of the path

/$/com.twitter.serverset

the ser­ver­set namer in­ter­prets its resid­ual path (every­thing af­ter /$/com.twit­ter.ser­ver­set). For ex­am­ple, the path

/$/com.twitter.serverset/sdzookeeper.local.twitter.com:2181/twitter/service/cuckoo/prod/read

names the ser­ver­set at the ZooKeeper path

/twitter/service/cuckoo/prod/read

on the ZooKeeper en­sem­ble named

sdzookeeper.local.twitter.com:2181

Namers can ei­ther re­curse (with a new path, which in turn is in­ter­preted by the dtab), or else ter­mi­nate (with a bound name, or a neg­a­tive re­sult). In the ex­am­ple above, the namer com.twit­ter.ser­ver­set makes use of Fi­na­gle’s ser­ver­set im­ple­men­ta­tion to re­turn a bound name with the con­tents of the ser­ver­set named by the given path.

Rules are eval­u­ated to­gether in a del­e­ga­tion table, and ap­ply bot­tom-up–the last match­ing rule wins. Rewrit­ing is a re­cur­sive process: it con­tin­ues un­til no more rules ap­ply. Eval­u­at­ing the name /s/​user/​main in the del­e­ga­tion table

/zk       => /$/com.twitter.serverset
/srv      => /zk/remote
/srv      => /zk/local
/s        => /srv/prod

yields the fol­low­ing rewrites:

/s/user/main
    /srv/prod/user/main
    /zk/local/prod/user/main
    */$/com.twitter.serverset/local/prod/user/main

(* de­notes ter­mi­na­tion: here the Namer re­turned a Bound.)

When a Namer re­turns a neg­a­tive re­sult, the in­ter­preter back­tracks. If, in the pre­vi­ous ex­am­ple, /$/com.twit­ter.ser­ver­set/​lo­cal/​prod/​user/​main yielded a neg­a­tive re­sult, the in­ter­preter would back­track to at­tempt the next de­f­i­n­i­tion of /srv, namely:

/s/user/main
  [1]    /srv/prod/user/main
  [2]    /zk/local/prod/user/main
  [3]    /$/com.twitter.serverset/local/prod/user/main  [neg]
/s/user/main
  [2]    /zk/remote/prod/user/main
  [3]    /$/com.twitter.serverset/remote/prod/user/main

Thus we see that this Dtab rep­re­sents the en­vi­ron­ment where we first look up a name in the lo­cal ser­ver­set, and if it does­n’t ex­ist there, the re­mote ser­ver­set is at­tempted in­stead.

The re­sult­ing se­man­tics are sim­i­lar to hav­ing sev­eral name­space over­lays: a name is looked up in each name­space, in the or­der of their lay­er­ing.

This arrange­ment al­lows for flex­i­ble link­ing. For ex­am­ple, I can use one dtab in a pro­duc­tion en­vi­ron­ment, per­haps even one for each dat­a­cen­ter; other dtabs can be used for de­vel­op­ment, stag­ing, and test­ing.

Sym­bolic nam­ing pro­vides a form of lo­ca­tion trans­parency: the user’s code only spec­i­fies what is to be linked, not how (where). This gives the run­time free­dom to in­ter­pret sym­bolic names how­ever is ap­pro­pri­ate for the given en­vi­ron­men; lo­ca­tion spe­cific knowl­edge is en­coded in the dtabs we’re op­er­at­ing with.

Wily takes this idea even fur­ther: Dtabs are them­selves scoped to a sin­gle re­quest; they are not global to the process or fixed for its life­time. Thus we have a base dtab that is amended as nec­es­sary to ac­count for be­hav­ior that is de­sired when pro­cess­ing a sin­gle re­quest. To see why this is use­ful, con­sider the case of stag­ing servers. We might want to start a few in­stances of a new ver­sion of a server to test that it is func­tion­ing prop­erly, or even for de­vel­op­ment or test­ing pur­poses. We might want to send a sub­set of re­quests (for ex­am­ple, those re­quests orig­i­nated by the de­vel­op­ers, or by testers) to the staged server, but leave the rest un­touched. This is easy with per-re­quest dtabs: we can sim­ply amend an en­try for se­lected re­quests, for ex­am­ple:

/s/user/main    => /s/user/staged

which would now in­ter­pret any re­quest des­tined for /s/​users/​main to be dis­patched to /s/​user/​staged.

Fi­nally, per-re­quest dtabs are trans­mit­ted across com­po­nent bound­aries, so that these rewrite rules ap­ply for the en­tire re­quest tree. This al­lows us to amend the dtab in a cen­tral place, while en­sur­ing it is in­ter­preted uni­formly across the en­tire re­quest tree. This re­quired us to mod­ify our RPC pro­to­col, Mux. We also have sup­port in Fi­na­gle’s HTTP im­ple­men­ta­tion.

This has turned out to be quite a flex­i­ble mech­a­nism. Among other things, we have used it for:

En­vi­ron­ment de­f­i­n­i­tion. As out­lined above, we make use of dtabs to de­fine a sys­tem’s en­vi­ron­ment, al­low­ing us to de­ploy it in mul­ti­ple en­vi­ron­ments with­out chang­ing code or main­tain spe­cial­ized con­fig­u­ra­tion.

Load bal­anc­ing. We’ve also made use of dtabs to di­rect traf­fic across dat­a­cen­ters. The mech­a­nism al­lows our fron­tend web servers to main­tain a smooth, uni­form way of nam­ing des­ti­na­tions. The in­ter­pre­ta­tion layer in­cor­po­rates rules and also man­ual over­rides to dy­nam­i­cally change the in­ter­pre­ta­tion of these names.

Stag­ing and ex­per­i­ments. As de­scribed above, we can change how names are in­ter­preted on a per re­quest ba­sis. This al­lows us to im­ple­ment stag­ing and ex­per­i­ment en­vi­ron­ments in a uni­form way with no spe­cial­ized in­fra­struc­ture. A de­vel­oper sim­ply has to in­stan­ti­ate her ser­vice (us­ing Au­rora), and in­struct our fron­tend web servers, through the use of spe­cial­ized http head­ers, to di­rect some amount of traf­fic to her server.

Dy­namic in­stru­men­ta­tion. At a re­cent hack week, Oliver Gould An­toine Tol­lenaere, and my­self made use of this mech­a­nism to im­ple­ment Dtrace-like in­stru­men­ta­tion for dis­trib­uted sys­tems. A small DSL al­lowed the de­vel­oper to sam­ple traf­fic (based on paths and RPC meth­ods), and process the se­lected traf­fic. Us­ing this sys­tem, we could in­tro­spect the sys­tem in myr­iad ways. For ex­am­ple, we could eas­ily print the pay­loads be­tween every server in­volved in sat­is­fy­ing a re­quest. We could also mod­ify the sys­tem’s be­hav­ior. For ex­am­ple, we could test hy­poth­e­sis about the sys­tem’s be­hav­ior un­der cer­tain fail­ure or slow-down con­di­tions.

In­ter­ac­tion model

The fi­nal build­ing block we need is an in­ter­ac­tion model. Our in­ter­ac­tion model should gov­ern how com­po­nents talk to each other. What does this mean?

To­day, Fi­na­gle has only a weakly de­fined That is not to say it is­n’t there–it very much is!–rather, much of it is im­plicit. in­ter­ac­tion model, but it’s prob­a­bly more than 50% of the core im­ple­men­ta­tion! These com­po­nents in­clude load bal­anc­ing, queue lim­its, time­outs and retry logic, and how er­rors are in­ter­preted. They are a ready source of con­fig­u­ra­tion knobs, many of which must be tuned well for good re­sults.

For ex­am­ple, con­sider the prob­lem of mak­ing sure that a ser­vice is si­mul­ta­ne­ously: well con­di­tioned, so that it prefers to re­ject some re­quests in­stead of de­grad­ing ser­vice for all; timely, so that users aren’t left wait­ing; and well be­haved You could also say that a ser­vice should re­spond to back­pres­sure. , so that it does­n’t over­whelm down­stream re­sources with re­quests they can­not han­dle, and so that it can­not over­whelm it­self and get into a sit­u­a­tion from which it can­not re­cover.

Tun­ing a sys­tem to ac­com­plish these goals in con­cert re­quires a fairly elab­o­rate ex­er­cise. At Twit­ter, we of­ten end up run­ning load tests to de­ter­mine this, or use pa­ra­me­ters de­rived from pro­duc­tion in­stances.

This arrange­ment is prob­lem­atic for sev­eral rea­sons.

First, the mech­a­nisms we em­ploy–time­outs, retry logic, queue­ing poli­cies, etc.–serve mul­ti­ple and of­ten con­flict­ing goals: re­source man­age­ment, live­ness, and SLA main­te­nance. Should we tol­er­ate a higher down­stream er­ror rate? In­crease the retry bud­get. But then this might in­crease our con­cur­rency level! Cre­ate big­ger con­nec­tion pools. But some of these may be cre­ated to han­dle only tem­po­rary spikes, which could af­fect the rate of pro­moted ob­jects. Lower the con­nec­tion TTL! And so on. With­out a com­plete un­der­stand­ing of the sys­tem, it is dif­fi­cult to even an­tic­i­pate the net ef­fect of chang­ing an in­di­vid­ual pa­ra­me­ter.

Sec­ond, such con­fig­u­ra­tion is in­her­ently sta­tic, and fails to re­spond to vary­ing op­er­at­ing con­di­tions and ap­pli­ca­tion changes over time.

Last and also more trou­bling, is the fact this arrange­ment se­verely com­pro­mises ab­strac­tion and thus mod­u­lar­ity: the con­fig­u­ra­tion of clients, based on a set of low level pa­ra­me­ters and their as­sumed in­ter­ac­tions, pre­clude many changes to Fi­na­gle. Sys­tems have come to rely on im­ple­men­ta­tion de­tails–and also the in­ter­ac­tion of dif­fer­ent im­ple­men­ta­tion de­tails!–that are spec­i­fied not at all.

We have tied the hands of the im­ple­men­tors.

In­tro­duc­ing goals

We can usu­ally for­mu­late high-level goals for the be­hav­ior that we’re con­fig­ur­ing with low-level knobs. (And if not: how do you know what you’re con­fig­ur­ing for?)

I think our goals usu­ally boil down to two. Ser­vice Level Ob­jec­tives, or SLOs, tell how what re­sponse times are ac­cept­able. Cost ob­jec­tives, or COs, tell how much you’re will­ing to pay for it.

Per­haps the sim­plest SLO is a dead­line. That is, the time af­ter which a re­sponse is no longer use­ful. A sim­ple cost ob­jec­tive is a ceil­ing for a ser­vices fan out — that is, what is an ac­cept­able ra­tio of in­bound to out­bound re­quests over some time win­dow.

The im­pli­ca­tions of such a seem­ingly sim­ple change in per­spec­tive are quite pro­found. Sud­denly, we have an un­der­stand­ing, on the se­man­tic level, what the ob­jec­tives are while ser­vic­ing a re­quest. This al­lows us to in­tro­duce dy­namic be­hav­ior. For ex­am­ple, with an SLO, an ad­mis­sion con­troller can treat a ser­vice as a black box, ob­serv­ing its run­time la­tency fin­ger­print. The con­troller can re­ject re­quests that are un­likely to be ser­vi­ca­ble in a timely man­ner.

Ser­vice Level Ob­jec­tives can also cross ser­vice bound­aries with a lit­tle help from the RPC pro­to­col. For ex­am­ple, Mux al­lows the caller to pass a dead­line to the callee. Http and other ex­ten­si­ble pro­to­cols may pass these via stan­dard head­ers.

This is still an ac­tive area of on­go­ing work, but I’ll show two sim­ple ex­am­ples where this is work­ing to­day.

Dead­lines

Dead­lines are rep­re­sented in Fi­na­gle with a sim­ple case class

case class Deadline(timestamp: Time, deadline: Time) extends Ordered[Deadline] {
  def compare(that: Deadline): Int = this.deadline.compare(that.deadline)
  def expired: Boolean = Time.now > deadline
  def remaining: Duration = deadline-Time.now
}

The cur­rent dead­line is de­fined in a re­quest con­text, and may be set and ac­cessed any time. The con­text is se­ri­al­ized across ser­vice bound­aries, so that dead­lines set in a client are avail­able from the callee-server as well.

We can now de­fine a fil­ter that re­jects re­quests that are ex­pired.

class RejectExpiredRequests[Req, Rep] 
    extends SimpleFilter[Req, Rep] {
  def apply(req: Req, service: Service[Req, Rep]): =
    Contexts.broadcast.get(Deadline) match {
      case Some(d) if d.expired =>
        Future.exception(Failure.rejected)
      case _ => service(req)
    }
}

Even this sim­ple fil­ter is use­ful. It im­ple­ments a form of load con­di­tion­ing. Con­sider for ex­am­ple servers that are just start­ing up, or have just un­der­gone garbage col­lec­tion. These are likely to ex­pe­ri­ence large de­lays in re­quest queue pro­cess­ing. This fil­ter makes sure these re­quests are re­jected by the server be­fore spend­ing com­pu­ta­tional re­sources need­lessly. We choos­ing to re­ject use­less work in­stead of bog­ging the server down fur­ther. This is all pos­si­ble be­cause we have an ac­tual de­f­i­n­i­tion of what use­less work” is.

Even in this sim­ple form, we could go fur­ther. For ex­am­ple, we might choose to in­ter­rupt a fu­ture af­ter the re­quest’s dead­line has ex­pired, so that we can ter­mi­nate pro­cess­ing, and again avoid wast­ing re­sources on need­less com­pu­ta­tion.

Ruben Oanta and Bing Wei are work­ing on more so­phis­ti­cated ad­mis­sions con­trollers that should be able to also ac­count for ob­served la­ten­cies, so that we can re­ject re­quests even ear­lier.

Fan out fac­tor

We have im­ple­mented a fan out fac­tor as a sim­ple cost ob­jec­tive. The fan de­scribes an ac­cept­able ra­tio of re­quests (from client code) to is­sues (Fi­na­gle’s at­tempts to sat­isfy those re­quests, which in­cludes re­tries). For ex­am­ple, a fac­tor of 1.2 means that Fi­na­gle will not in­crease down­stream load by more than 20% over nom­i­nal load.

The im­ple­men­ta­tion smooths this bud­get over a win­dow, so as to tol­er­ate and smooth out spikes. Con­versely, we don’t want to cre­ate spikes: the win­dow makes sure that you can’t build up a large reser­voir only to ex­pell it all at once.

Again this pre­sents a way to spec­ify a high level goal, and let Fi­na­gle at­tempt to sat­isfy it for you. The de­vel­oper is freed from rea­son­ing about fixed retry bud­gets, trad­ing off the need to tol­er­ate tran­sient fail­ures, while at the same time avoid­ing load am­pli­fi­ca­tion by way of re­tries. In­stead a sin­gle num­ber tells it all.

Con­clu­sion

We’re mov­ing to­wards a model of RPC that op­er­ates at a higher level of ab­strac­tion. In­stead of treat­ing RPC as a sim­ple re­mote dis­patch prim­i­tive, we have in­tro­duced high-level tar­gets: sym­bolic names, ser­vice level ob­jec­tives, and cost ob­jec­tives.

Sym­bolic nam­ing gives us ab­stract des­ti­na­tions that whose con­crete des­ti­na­tons can be eval­u­ated by the op­er­at­ing en­vi­ron­ment. They de­cou­ple the ab­stract no­tion of a des­ti­na­tion from their phys­i­cal in­ter­pre­ta­tion.

Ser­vice and cost ob­jec­tives re­move the de­pen­den­cies from ser­vices on low level con­fig­u­ra­tion, and im­ple­men­ta­tion de­tails, and in­stead gives Fi­na­gle a great deal of free­dom to im­ple­ment bet­ter strate­gies.

We have un­tied the hands of the im­ple­men­tors.

Thank you for com­ing out to­day, and en­joy the rest of the con­fer­ence!