EnterpriseDB: The Enterprise Postgres Company Postgres Plus Forums: The PostgreSQL Open Source Database from EnterpriseDB
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Members]  Member Listing   [Groups] Back to home page 

Paralle data loading  XML

Forum Index » GridSQL - General
Author Message
sijie_guo

New member

Joined: 10 Jun 2010 03:35:15
Messages: 3
Offline

Hi,

It is my first time to use GridSQL.
I tried to load 200G data into a parititioned table of a 20-nodes GridSQL cluster using gs-loader. It seems that gs-loader will send data to the coordinator, then the coordinator partitions the data to each agent. So the coordinator is a bottleneck during data loading.

Is there any parallel data loading tools to load the data parallely?

Regards,

Samuel
Andrei_M

Senior member

Joined: 19 Dec 2008 01:37:13
Messages: 116
Offline

Hi Samuel,

You can try and set up multiple coordinators with the same nodes and the same metadata database. Do not run createmddb.sh or createdb.sh for them! They can perform parallel loading.
You will have a problem if you loading indexed tables or tables with autoincrement fields, including "with rowid" case.
However this trick should work for plain tables.

Andrei

Thanks
Andrei
sijie_guo

New member

Joined: 10 Jun 2010 03:35:15
Messages: 3
Offline

I will try this tricky method. Thanks Andrei.
sijie_guo

New member

Joined: 10 Jun 2010 03:35:15
Messages: 3
Offline

It seems that an agent is configured to connect just one coordinator.
If I try to setup up multi coordinators, It seems that I need to start multiple agents in each slave node. Is it right?
Vibhor_K

Senior member
[Avatar]

Joined: 3 Jul 2009 09:46:15
Messages: 444
Offline

If I try to setup up multi coordinators, It seems that I need to start multiple agents in each slave node. Is it right?


One Agent will report to One Co-ordinator.

Thanks & Regards,
Vibhor Kumar
Blog:http://vibhork.blogspot.com
[Email] [WWW]
Andrei_M

Senior member

Joined: 19 Dec 2008 01:37:13
Messages: 116
Offline

Samuel,

You can configure extra coordinators without agents, just specify xdb.node.N.dbhost in the main config file

Andrei

Thanks
Andrei
ranga.gopalan@sorrisotech.com

Member

Joined: 23 Feb 2010 12:55:49
Messages: 19
Offline

Hi Andrei,

A couple of questions about this:

1. Do you mean we set up multiple gridsql.config file each with the coordinator on a different port and run gs-server.sh pointing to each specific config file to startup multiple processes?

2. In this scenario, on the client side, I guess we need to make some changes on our client side to perhaps have multiple URLs for the target coordinator and perhaps use some logic to distribute the client load across different coordinators.

Thanks,

Ranga
Andrei_M

Senior member

Joined: 19 Dec 2008 01:37:13
Messages: 116
Offline

Hi Ranga,

This trick only applies for initial loading.
If you configure multiple coordinators and access them concurrently with generic clients you may end up with corrupted database.

Andrei

Thanks
Andrei
 
Forum Index » GridSQL - General
Go to:   
Powered by JForum 2.1.8 © JForum Team