| Author |
Message |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 14 Sep 2009 08:08:03
|
David_
Member
Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline
|
I am partitioning my facts table on a column that has 30 different values. There are good reasons to partition on this column, as it avoids a lot of row shipping. My problem is that the distribution of rows between three nodes is roughly 20%, 50% and 30%. My question is whether it is possible to dictate to GridSQL how which values should go on which nodes using the current GridSQL implementation.
Regards,
David
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 14 Sep 2009 09:46:29
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi David,
Sorry, you cannot do this quite yet. What is the data type? Though not an optimal solution, I suppose you could map these to some other values that give you a more even distribution...
Regards,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 14 Sep 2009 09:58:07
|
David_
Member
Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline
|
Hi Mason,
Thanks for your feedback. At the moment it is a varchar(50), though I am experimenting with converting it to Enum on the underlying nodes (quite unsuccesful at the moment).
I could experiment with using a dimension table and joins, but as I am using Tableau as my frontend, I believe it will affect performance badly.
I will be able to solve this one way or the other, for example by adding trailing spaces, so it is not a big problem.
Regards,
David
This message was edited 1 time. Last update was at 14 Sep 2009 10:02:02
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 15 Sep 2009 21:08:58
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Just curious- what is it that Tableau does that it handles dimensions poorly?
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 16 Sep 2009 06:00:56
|
David_
Member
Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline
|
I have set up Tableau to make it as easy as possible for my future users. This means that all dimensions are available to the user all the time. Unfortunatelly, Tableau then joins in all dimensions even when they are not used in a query at all. So if one fact table joins in 10-20 dimensions that are not even used it severely affects performance.
Regards,
David
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 25 Feb 2010 17:18:19
|
David_
Member
Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline
|
I now want to undertake tests, using 10 nodes and partitioning my fact table on a column that has only about 40 unique values. If I am lucky , the data will be partitioned relatively evenly, though even if it does this time, I suspect that at some point in the future, this will become a major headache. If I was to implement some sort of forced partitioning, do you have any idea of which parts of the code would be affected. Where is the actual hashing function that is run on the partitioned column?
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 1 Mar 2010 04:05:02
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Yes, with few values, you may get an uneven distribution. Still, you might get that, too, if across those 40 values there is a wide disparity of the number of rows for each.
As a result, you may want to consider partitioning by another column. Any query involving that current column could also then be processed in parallel- your queries may run faster, it depends.
Anyway, we have thought about making this more flexible as well, such as being able to link in your own partitioning functions. If you want to look, you could start with apps/gridsql/src/com/edb/gridsql/metadata/partitions/HashPartitionMap.java.
Regards,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 1 Mar 2010 04:16:14
|
David_
Member
Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline
|
Thanks Mason. I have thought about partitioning on something else, but it would cause a lot of row shipping for my ETL operation. I will have a look at the code and get back to you if I decide to implement the needed functionality.
Regards,
David
|
|
|
 |
|
|