| Author |
Message |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 19 Aug 2008 10:33:01
|
Jon_K
Member
Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline
|
Hi All,
I'm running into a couple of problems with the GridSQL coordinator.
1. The first issue is that the coordinator stops accepting connections and allows only one connection at a time. This issue returns an error to the psql client which exits gracefully.
2. The second issue is that the coordinator blackholes the psql/php connection attempt and doesn't return a status to the client.
This is what I see in the console.log around the time it happens.
2008-08-18 23:55:04,504 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBMessageMonitorException: Server has aborted execution, cause is: Timeout expired (3600000)
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlInsertTable.fillTempTable(Unknown Source)
at com.edb.gridsql.parser.SqlModifyTable.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Both issues require a bounce of the coordinator and agents in series.
Thanks
Jon King
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 20 Aug 2008 13:20:43
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Jon,
When these things happen, is it after a long time that the system has been up? Does it happen as a result of a malformed query or any other GridSQL error (that is, perhaps GridSQL does not handle some other error properly and as a result does not free connections)?
And just to make sure, xdb.maxconnections and xdb.default.threads.pool.maxsize (etc) are set to be more than 1?
When this happens, if you run "SHOW STATEMENTS;", does it show any other activity?
Also, which version are you using?
For 2), are you executing a particularly long query, one that may run more than one hour? The timeout is in milliseconds. If so, you can increase these value by overriding it by setting xdb.messagemonitor.timeout.millis to a higher number (try 36000000 (10 hours) or more). Perhaps this is related to (1), in that there are multiple long running queries that timeout that could not be cancelled properly on the underlying databases.
Sorry for the trouble.
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 22 Aug 2008 12:56:45
|
Jon_K
Member
Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline
|
Hi Mason,
I am able to get the grid to stop responding when I perform the following query without a group by clause
SELECT silo_id, count(*) FROM user_fact;
Which results in the following console.log entry
2008-08-22 10:42:46,668 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBMessageMonitorException: Node 5 has aborted execution, cause is: java.sql.SQLException : ERROR: column "user_fact.silo_id" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlSelect.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "user_fact.silo_id" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
... 1 more
2008-08-22 10:42:46,672 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,672 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:1, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:1, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:8, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:8, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:3, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:3, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:6, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:6, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:4, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:4, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,684 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:7, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,684 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:7, To: [0], Session: 1, Request: 16]
After this the console stops responding.
I'm running verson 1.0.
xdb.default.threads.pool.initsize=5
xdb.default.threads.pool.maxsize=10
The coordinator sits on a standalone server with 4 underlying physical nodes.
Here is my config file
###########################################################################
# Copyright (c) 2008 EnterpriseDB Corporation
#
# gridsql.config
#
# GridSQL configuration file
###########################################################################
###
### Server settings
###
xdb.port=6453
xdb.maxconnections=10
###
### Node & JDBC Pool configuration
###
### Set defaults for all nodes and MetaData database.
### These can be overriden.
xdb.default.dbusername=*****
xdb.default.dbpassword=*****
xdb.default.dbport=5432
### Connection thread defaults for each node
### Note that these are pooled, so the number of clients connected
### to GridSQL can be greater than pool size.
xdb.default.threads.pool.initsize=5
xdb.default.threads.pool.maxsize=10
### Connectivity for MetaData database
xdb.metadata.database=XDBSYS
xdb.metadata.dbhost=den3db025
### The number of nodes in cluster
xdb.nodecount=8
### The hosts of the underlying databases
xdb.node.1.dbhost=den3db025
xdb.node.2.dbhost=den3db025
xdb.node.3.dbhost=den3db026
xdb.node.4.dbhost=den3db026
xdb.node.5.dbhost=den3db027
xdb.node.6.dbhost=den3db027
xdb.node.7.dbhost=den3db028
xdb.node.8.dbhost=den3db028
### Designate coordinator node number
### In practice, the coordinator node should be the node where
### GridSQL is running.
xdb.coordinator.node=1
### The next few sections are required when wanting to run agents on the nodes.
### Uncomment them and modify to communicate with agents
###
### Only for agent version
### Port for node's SocketCommunicator
#xdb.node.1.port=6455
#xdb.node.1.host=den3db025
#xdb.node.2.port=6455
#xdb.node.2.host=den3db025
xdb.node.3.port=6455
xdb.node.3.host=den3db026
xdb.node.4.port=6456
xdb.node.4.host=den3db026
xdb.node.5.port=6455
xdb.node.5.host=den3db027
xdb.node.6.port=6456
xdb.node.6.host=den3db027
xdb.node.7.port=6455
xdb.node.7.host=den3db028
xdb.node.8.port=6456
xdb.node.8.host=den3db028
### Designate coordinator node
### In practice, the coordinator node should be the node where
### GridSQL is running.
xdb.coordinator.host=den3dbprx01
xdb.coordinator.port=6454
# Specify protocol types.
# Can use local connection between coordinator and node 1,
# since they are the same system
#xdb.connector.0.1=0
#xdb.connector.1.0=0
#xdb.connector.1.2=0
#xdb.connector.2.1=0
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 22 Aug 2008 13:48:29
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Thanks for the info.
It sounds like we have two issues to investigate. First, in the event of a problem, that it is handled gracefully (pooled connections cleaned/freed), an area we have previously spent time on and verified. Second, when doing semantic checks, catch the case when an expected GROUP BY is not present before we parallelize and send an invalid query to the backends.
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 22 Aug 2008 13:57:54
|
Jon_K
Member
Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline
|
Does the fact that I have xdb.coordinator.node set to 1 when the coordinator doesn't really reside on that node raise any issues?
This message was edited 1 time. Last update was at 22 Aug 2008 13:59:41
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 22 Aug 2008 17:26:15
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Jon,
xdb.coordinator.node can indeed be separate from where the coordinator process is running.
It really just indicates a preference for which underlying database instance to use in a couple of cases, like when a single consolidated intermediate table is created. This however, is a very rare occurrence, as nearly all steps try and make use of all available nodes, and if used these will usually contain a small number of rows.
Regards,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 25 Aug 2008 16:07:54
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Jon,
I just wanted to update you on this- we are having problems reproducing the connection issue. We do run into the same error without the GROUP BY of course, but GridSQL seems to handle the errors gracefully and it does not affect other sessions and new connections.
I am a bit concerned that we can’t reproduce the resulting connection problem when this error occurs, but we can at least make sure we check for GROUP BY properly.
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 26 Aug 2008 11:03:16
|
Jon_K
Member
Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline
|
Mason,
Perhaps it is my system setup.
I'm going to move the coordinator to node 1 and see if that helps.
Jon
UPDATE:
1. Disabled all agents on the underlying nodes
2. Moved coordinator from dedicated server to node 1. Coordinator is reporting node errors correctly
3. Moved coordinator back to dedicated server. Coordinator is reporting node errors correctly
Looks like an agent problem. They're not reporting errors back from the nodes to the specified coordinator. Plus they don't reconnect when restarting the coordinator. :/
This message was edited 3 times. Last update was at 26 Aug 2008 13:33:59
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 26 Aug 2008 13:41:06
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Jon,
Thank you for trying that out. Sorry again that we cannot reproduce the problem here.
From your config file, these lines were commented out:
#xdb.node.1.port=6455
#xdb.node.1.host=den3db025
#xdb.node.2.port=6455
#xdb.node.2.host=den3db025
If 1 and 2 were physically separate from the coordinator, then you probably do want to uncomment these and kick off an agent for these.
Still, the fact that these were connected to from the coordinator instead should be ok, just as you saw that everything was handled properly when all were done within the coordinator.
Is there anything of interest in the agent log files? Do you mind sending me the agent config files?
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 26 Aug 2008 13:51:56
|
Jon_K
Member
Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline
|
Hi Mason,
Thanks again for helping with this.
Here is the config file for Physical Node 2, which houses databases usersN3 and usersN4.
##########################################################################
# Copyright (c) EnterpriseDB Corporation
#
# gridsql_agent.config
#
# This configuration file is used only when agents are running on
# the non-coordinator nodes.
##########################################################################
###
### The coordinator host and port
###
xdb.coordinator.host=den3db025
xdb.coordinator.port=6454
###
### Logging settings
###
log4j.rootLogger=WARN, console
log4j.logger.Server=ALL, console
# A1 is set to be a ConsoleAppender.
log4j.appender.console=org.apache.log4j.RollingFileAppender
# A1 uses PatternLayout.
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%r [%t] %-5p %c %x - %m%n
log4j.appender.console.File=/usr/local/gridsql-agent-1.0/log/agent.log
Nothing interesting in the agent log files, just the connections.
2655488 [Thread-2] INFO Server - Node 3: thread pool is ready
2655488 [Thread-2] INFO Server - Node 3: thread pool is ready
2655489 [Thread-2] WARN com.edb.gridsql.communication.NodeAgent - Node Thread Pool was initialized twice on node 3
2655489 [Thread-2] INFO Server - Node 3: Connection to Coordinator is completed
2655489 [Thread-2] INFO Server - Node 3: Connection to Coordinator is completed
2655846 [pool-1-thread-7] INFO Server - Node 3: connection pool for database users is ready
2655846 [pool-1-thread-7] INFO Server - Node 3: connection pool for database users is ready
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 27 Aug 2008 09:26:12
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Join,
Please try adding these lines:
xdb.node.3.port=6455
xdb.node.4.port=6456
to gridsql_agent.conf on the system that hosts nodes 3 and 4, and make similar changes for the other agents.
On startup, the agent's connector starts listening on specified port. If this is not present, the connector does not listen for incoming connections and the coordinator won't be able to connect. Still, on startup the agent tries to connect to the coordinator and if successful, the coordinator will use this existing connection to send messages to the node. If the connection is broken because of coordinator is restarted or due to a network problem coordinator won't be able to restore the connection and node has to be restarted too.
I am not sure that this will solve the issue, but it is good to do anyway, and it was one difference in the configuration that the engineer had when trying to reproduce the problem.
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 27 Aug 2008 14:22:01
|
Jon_K
Member
Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline
|
Hi Mason,
Still no love from the agents to the coordinator. I have disabled the agents on the underlying nodes and am running the coordinator without them for the time being for stability.
I have now gone back to using two machines to test from scratch.
here is what I've found so far. Sorry, this may get long winded.
Node1 has the coordinator and database "test" Partition 1
Node2 has "test" Partition2
create table usr_fact(
user_id int,
gender varchar(1)
) PARTITIONING KEY user_id ON ALL;
INSERT INTO usr_fact(1,'M');
INSERT INTO usr_fact(2,'F');
INSERT INTO usr_fact(3,'F');
INSERT INTO usr_fact(4,'M');
I set up just the coordinator and configured it to connect directly to the postgres instances on 5432 on node1 and node2.
### Connectivity for MetaData database
xdb.metadata.database=XDBSYS
xdb.metadata.dbhost=node1
### The number of nodes in cluster
xdb.nodecount=2
### The hosts of the underlying databases
xdb.node.1.dbhost=node1
xdb.node.2.dbhost=node2
#xdb.node.3.dbhost=127.0.0.1
#xdb.node.4.dbhost=127.0.0.1
### Designate coordinator node number
### In practice, the coordinator node should be the node where
### GridSQL is running.
xdb.coordinator.node=1
console.log says:
2008-08-27 11:42:31,054 - INFO Coordinator: Node 2 is connected
2008-08-27 11:42:31,054 - INFO Coordinator: Node 2 is connected
2008-08-27 11:42:31,056 - INFO Coordinator: Node 1 is connected
2008-08-27 11:42:31,056 - INFO Coordinator: Node 1 is connected
2008-08-27 11:42:31,065 - INFO Node 2: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 2: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 1: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 1: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 2: Connection to Coordinator is completed
2008-08-27 11:42:31,065 - INFO Node 2: Connection to Coordinator is completed
2008-08-27 11:42:31,065 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:42:31,065 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:42:31,181 - INFO Node 2: connection pool for database test is ready
2008-08-27 11:42:31,181 - INFO Node 2: connection pool for database test is ready
2008-08-27 11:42:31,184 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:42:31,184 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:42:31,184 - INFO *** Database test is now online
2008-08-27 11:42:31,184 - INFO *** Database test is now online
I then installed the agent via rpm downloaded this morning from the sourceforge.net repository on node2.
rpm -ivh gridsql-agent-1.0-0.noarch.rpm
Changed the config file on node2 for gridsql_agent.config to:
xdb.coordinator.host=node1
Started up the agent by executing the following from the command line
sudo /usr/local/gridsql-agent-1.0/bin/gs-agent.sh -n 2
Changed the following lines in gridsql.config to reflect that I now have an agent running on node2.
### Only for agent version
### Port for node's SocketCommunicator
xdb.node.1.port=6455
xdb.node.1.host=node1
xdb.node.2.port=6455
xdb.node.2.host=node2
#xdb.node.3.port=6455
#xdb.node.3.host=192.168.123.102
#xdb.node.4.port=6455
#xdb.node.4.host=192.168.123.103
### Designate coordinator node
### In practice, the coordinator node should be the node where
### GridSQL is running.
xdb.coordinator.host=node1
xdb.coordinator.port=6454
I shutdown and restart the coordinator using the new config file.
sudo /usr/local/gridsql-1.0/bin/gs-shutdown.sh -f -u <user> -p <password>
sudo /usr/local/gridsql-1.0/bin/gs-server.sh -d test
console.log says:
2008-08-27 11:45:19,354 - INFO Coordinator: Node 1 is connected
2008-08-27 11:45:19,354 - INFO Coordinator: Node 1 is connected
2008-08-27 11:45:19,368 - INFO Node 1: thread pool is ready
2008-08-27 11:45:19,368 - INFO Node 1: thread pool is ready
2008-08-27 11:45:19,369 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:45:19,369 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:45:19,380 - INFO Coordinator: Node 2 is connected
2008-08-27 11:45:19,380 - INFO Coordinator: Node 2 is connected
2008-08-27 11:45:19,434 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:45:19,434 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:45:19,495 - INFO *** Database test is now online
2008-08-27 11:45:19,495 - INFO *** Database test is now online
From here, I am able to query the database tables as long as I don't do something stupid which will cause the underlying node to throw and error.
This works and returns a result
SELECT gender, count(0) FROM usr_fact GROUP BY 1 order by 1
This causes the underlying node to thrown an error which never makes it back to the coordinator or the psql screen.
SELECT gender, count(0) FROM usr_fact ORDER BY 1
At this point, psql looks like it is trying to fulfill the query, but the console.log shows this:
2008-08-27 12:04:20,204 - ERROR Catching throwable:
java.sql.SQLException: ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2306)
at com.edb.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1931)
at com.edb.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:643)
at com.edb.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:476)
at com.edb.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:390)
at com.edb.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:251)
at com.edb.gridsql.engine.NodeProducerThread.executeQuery(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,204 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,204 - ERROR Catching throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,205 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,206 - ERROR Catching throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,206 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBMessageMonitorException: Node 1 has aborted execution, cause is: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlSelect.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
... 1 more
2008-08-27 12:04:20,206 - ERROR Catching throwable:
java.nio.channels.NotYetConnectedException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:114)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:139)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.read(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.receive(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractReceivingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,208 - ERROR Catching throwable:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:512)
at com.edb.gridsql.communication.SocketConnector$SendingThread.send(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,208 - ERROR Catching throwable:
java.nio.channels.NotYetConnectedException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:114)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:139)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.read(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.receive(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractReceivingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,210 - ERROR Catching throwable:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:512)
at com.edb.gridsql.communication.SocketConnector$SendingThread.send(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$SendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,210 - ERROR Catching throwable:
java.nio.channels.NotYetConnectedException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:114)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:139)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.read(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.receive(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractReceivingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,212 - ERROR Catching throwable:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:512)
at com.edb.gridsql.communication.SocketConnector$SendingThread.send(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$SendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,212 - ERROR Can not deliver message, sending MSG_ABORT back to Agent
2008-08-27 12:04:20,212 - ERROR Can not deliver message, sending MSG_ABORT back to Agent
2008-08-27 12:04:20,292 - ERROR Catching throwable:
com.edb.gridsql.exception.XDBServerException: Failed To Get Results For ( SQL , NodeURL) : ( SELECT "usr_fact"."gender" AS "XCOL1",count(*) AS "XCOL2" FROM "usr_fact" ) eQS Node 1 has aborted execution, cause is: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlSelect.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.edb.gridsql.exception.XDBMessageMonitorException: Node 1 has aborted execution, cause is: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
... 13 more
Caused by: com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
... 1 more
2008-08-27 12:04:20,292 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 2, Request: 9]
2008-08-27 12:04:20,292 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 2, Request: 9]
At this point, psql is completely useless.
Ctl+C spits out "Cancel request sent" every time I hit it.
The only way out is Ctl+Z.
At this point, I can psql back into the coordinator.
psql -U <user> -p 6453 -d test -h localhost
From here, I try to run a query
select count(*) FROM usr_fact
but, psql stops responding again.
I then have to kill the java processes on both nodes.
killall java
And restart
sudo /usr/local/gridsql-1.0/bin/gs-server.sh -d test
sudo /usr/local/gridsql-agent-1.0/bin/gs-agent.sh -n 2
Just to summarize:
I'm able to run the grid without the agents and have it report back errors from the underlying nodes without incident.
Adding in the agent causes node errors not to be reported back to psql, which locks up the gridsql system.
Sorry for the long string of steps and thanks for taking the time to look into this.
Jon
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 28 Aug 2008 10:29:04
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Jon,
We are still having problems reproducing the problem.
If it is not too much trouble can you set the log level to DEBUG (instead of INFO), on both the server and agent and retry to get us more logging output?
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 29 Aug 2008 11:30:44
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Jon,
Just letting you know that we did at least address detecting the missing GROUP BY sooner:
http://forums.enterprisedb.com/posts/list/1422.page
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 2 Sep 2008 09:15:51
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Hi Jon,
Someone here was able to reproduce the problem after all. He said that the problem disappeared when he added the node ids to agent config files, as discussed earlier in this thread.
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 2 Sep 2008 10:53:11
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
A clarification to the previous statement- the problem disappears when adding port information for the nodes in the agent config file.
Thanks,
Mason
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 9 Jun 2010 21:51:05
|
null
Member
Joined: 2 Apr 2008 17:21:41
Messages: 78
Offline
|
After install the gridsql 2.0 and config it, then start the gs-server.sh , then start gs-dbstart.sh -d test.
always create an error 'can not bring db test online'
|
|
|
 |
![[Post New]](/templates/default/images/icon_minipost_new.gif) 10 Jun 2010 10:00:12
|
Mason_S
Senior member
Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline
|
Did you first run gs-createmddb.sh before starting?
Also, before using the database "test" you must first create it with gs-createdb.sh.
Regards,
Mason
|
|
|
 |
|
|
|
|