First Things First - Where to Start

Mar 4, 2009 at 2:26 PM
So the proposed starting point is in the block logic.  I think I'll start in on the F# libraries for how data will be stored locally on each machine in blocks.  This would included any functionatlity we think a block should have like:
  1. Appending
  2. Splitting
  3. Max / Min
  4. Sorting
  5. Removing an element

Seperate from this would be the logic that actually tracks these blocks.  More on that later, but I'm thinking a sql express db which holds basic info like:

  1. Location of block
  2. First / Last
  3. Size

Then obviously we'll need logic for blocks accross multiple machines like:

  1. Whose also got a copy
  2. Before first / after last Link List style references to info on other machines relative to a block
  3. Load of blocks on this machine and others for balancing

Then comes the real crux, the messaging tier through which all machines communicate this information to each other and make desicions about trading, distributing, backing up, etc.

Mar 9, 2009 at 2:41 AM

So in addition to the steps above, I've been thinking about some of the details of sorting, load balancing and a couple other elements of the blocks.

I've decided to start with two underlying structures divvies and regions.  The main difference being that regions are key based (not to say divvies couldn't be), but preproccessing is put into sorting, deduping, and localizing regions (similiar to the concept of regions in HBase), but some data sets may not require this overhead.  For instance key based searches, relationsional data sets, map reduce, all may be good for types of regions, but do you really need all that for a distributed grep (search).

The plan is to have multiple types of structures available which can be declared when the set is created (a maybe switched at a later time), of which I think many will come from these two primary concepts.  I would like to some documentation in source control to start fleshing this out.