From: Piete Brooks <Piete.Brooks@cl.cam.ac.uk>
To: patrick@Verity.COM (Patrick Horgan)
Message Hash: f46397c691483b25190071c1e1fe77b2bc662dd3f264903b82daf14731c4d979
Message ID: <“swan.cl.cam.:264930:950828202725”@cl.cam.ac.uk>
Reply To: <9508281854.AA20060@cantina.verity.com>
UTC Datetime: 1995-08-28 20:28:20 UTC
Raw Date: Mon, 28 Aug 95 13:28:20 PDT
From: Piete Brooks <Piete.Brooks@cl.cam.ac.uk>
Date: Mon, 28 Aug 95 13:28:20 PDT
To: patrick@Verity.COM (Patrick Horgan)
Subject: Re: SSL trouble
In-Reply-To: <9508281854.AA20060@cantina.verity.com>
Message-ID: <"swan.cl.cam.:264930:950828202725"@cl.cam.ac.uk>
MIME-Version: 1.0
Content-Type: text/plain
>> Note that it is a list, and it tries them in order (all A RRs).
> Wouldn't this result in the slaves higher in the list being hammered?
There is not "a" list ...
The list returned will be "tailored" for the calling host.
Thus an EU host will have EU sites near the front, AU hosts will have AU
servers, etc ...
(Well, maybe they'll just be returned in a random order !).
By having multiple A RRs for a name, the DNS will do the pseudo load balancing.
> Perhaps you want to do something simular to what the later releases of
> bind do with machines with multiple names, and round robin the list.
Indeed -- as above:
1) optimise the list so that "near" servers are used.
2) cycle the servers
3) leave it to the DNS to SHUFFLE.
> If you had a list with hosts A, B, and C, the first request would get
> ABC, the next BCA, the next CAB, and the next back to ABC. That would
> distrubute the work between the slaves a bit better.
Yup -- but if it's just "random", I'd probably use a single name ...
> Actually this is what I meant, that they would ask for it. My idea would
> be that when a slave is asked for keyspace, if they don't have enough
> they'd ask for the next large chunk. That way the central server doesn't
> ever have to deal with small requests.
Well, the current implementation will give what it has left, then on restarting
the main loop it notices that it has no keys left, so asks the main server for
more, so if the clients aren't all bunched up, it'll pre-fetch more segments,
so there's a fair chance the client won't have to wait :-))
>> Nah - results still go direct pro tem.
> You might consider it:)
I can always add some more A RRs to sksp-ack to load balance ...
>> These cache servers are asking for non trivial amounts of keyspace.
>> As such there should not be *too* many, and then need to be "managed" ...
>> If one crashes, the logs need to be scanned to see how to restart it (so that
>> it starts by doling out the segments that it had no sub-doled to its clients)
> Quite right. I'd assume that the first level list of slaves would be
> controlled by you.
Possible ....
I've had various offers to host a server ...
> If you're careful enough a slave should be able to go down and come
> back up without losing any state at all.
At a cost ....
Either it has to save state in a form that's easy to reload later,
or save state in a way that it can spend some time before it starts to
work out what it has to do [[Hmm -- I might write a script to do that]]
> All brutes/slaves talking to it should be able to continue on with no loss
> of information.
Loss of what info ?
Running brutessl's will call brclient to report the ACK.
They will report back the data as normal -- nothing to do with the Allocate
Slave -- even if it were, it would auto fallback to another server.
brloop's will ask for another keyspace, and on finding that the first server
on its list doesn't respond, it'll try the next server on its list, and if
none respond, it'll wait a bit and start asking again ...
> I would put an exponential backoff on the time between retries for the
> brutes talking to the slaves as well as the slaves talking to the master.
Well, I use a multiplicative backoff within limits ....
> (With a limit for the amount of backoff of course.)
Indeed -- how long ?
> If you can't talk to someone you might sleep for 8 seconds and retry,
> if you still couldn't back off to 16, the 32, then 64, then 128, etc...
Well, 60, 120, 180, 240, 300, 300, 300, ...
> the maximum might be somewhere around ten or fifteen minutes, so that within
> ten or fifteen minutes of crashing and being restarted everything would be
> humming along with no manual intervention required on any of the lower levels.
Well, 5 mins ... < 1/3 or a segment ...
>> However, note that with big cache servers (as opposed to Local CPU Farm
>> servers where all clients are the same "ID") reports of sub-allocation have
>> to be passed back to the root :-(
> That's a good point. If you want to keep track of who has what, it all has
> to get back to the root eventually.
Yes -- I do -- so that it's possible to tie up requests and ACKs.
> If you use my idea of having the slaves cache the information until the next
> time they'd be contacting the root anyway, (or whenever the timer elapses,)
> then you greatly cut down on the number of small packets seen by the root,
> (and each level of slaves when their's a hierarchy).
Yup -- that's what an ACK reflector will do.
Note that Allocation and ACKs are separate ....
> <snicker;> Sound like we could have a religious war if we wanted,
I was agreeing with you ! You said:
>>> Of course I do everything like this in C++, but I suppose perl would be the
>>> most portable. It's a shame it's so aethestically displeasing to the eye.
or is the "it" in "shame it's so" not the preceding direct noun, i.e. C++? :-)))
Return to August 1995
Return to “Piete Brooks <Piete.Brooks@cl.cam.ac.uk>”