EnterpriseDB: The Enterprise Postgres Company Postgres Plus Forums: The PostgreSQL Open Source Database from EnterpriseDB
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Members]  Member Listing   [Groups] Back to home page 

Forcing the partitioning  XML

Forum Index » GridSQL - General
Author Message
David_

Member

Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline

I am partitioning my facts table on a column that has 30 different values. There are good reasons to partition on this column, as it avoids a lot of row shipping. My problem is that the distribution of rows between three nodes is roughly 20%, 50% and 30%. My question is whether it is possible to dictate to GridSQL how which values should go on which nodes using the current GridSQL implementation.

Regards,

David

Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Hi David,

Sorry, you cannot do this quite yet. What is the data type? Though not an optimal solution, I suppose you could map these to some other values that give you a more even distribution...

Regards,

Mason
[WWW]
David_

Member

Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline

Hi Mason,

Thanks for your feedback. At the moment it is a varchar(50), though I am experimenting with converting it to Enum on the underlying nodes (quite unsuccesful at the moment).

I could experiment with using a dimension table and joins, but as I am using Tableau as my frontend, I believe it will affect performance badly.

I will be able to solve this one way or the other, for example by adding trailing spaces, so it is not a big problem.

Regards,

David

This message was edited 1 time. Last update was at 14 Sep 2009 10:02:02

Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Just curious- what is it that Tableau does that it handles dimensions poorly?

Thanks,

Mason
[WWW]
David_

Member

Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline

I have set up Tableau to make it as easy as possible for my future users. This means that all dimensions are available to the user all the time. Unfortunatelly, Tableau then joins in all dimensions even when they are not used in a query at all. So if one fact table joins in 10-20 dimensions that are not even used it severely affects performance.

Regards,

David
David_

Member

Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline

I now want to undertake tests, using 10 nodes and partitioning my fact table on a column that has only about 40 unique values. If I am lucky , the data will be partitioned relatively evenly, though even if it does this time, I suspect that at some point in the future, this will become a major headache. If I was to implement some sort of forced partitioning, do you have any idea of which parts of the code would be affected. Where is the actual hashing function that is run on the partitioned column?
Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Yes, with few values, you may get an uneven distribution. Still, you might get that, too, if across those 40 values there is a wide disparity of the number of rows for each.

As a result, you may want to consider partitioning by another column. Any query involving that current column could also then be processed in parallel- your queries may run faster, it depends.

Anyway, we have thought about making this more flexible as well, such as being able to link in your own partitioning functions. If you want to look, you could start with apps/gridsql/src/com/edb/gridsql/metadata/partitions/HashPartitionMap.java.

Regards,

Mason

[WWW]
David_

Member

Joined: 10 Apr 2008 16:25:24
Messages: 25
Offline

Thanks Mason. I have thought about partitioning on something else, but it would cause a lot of row shipping for my ETL operation. I will have a look at the code and get back to you if I decide to implement the needed functionality.

Regards,

David
 
Forum Index » GridSQL - General
Go to:   
Powered by JForum 2.1.8 © JForum Team