EnterpriseDB: The Enterprise Postgres Company Postgres Plus Forums: The PostgreSQL Open Source Database from EnterpriseDB
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Members]  Member Listing   [Groups] Back to home page 

GridSQL connection issues  XML

Forum Index » GridSQL - General
Author Message
Jon_K

Member

Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline

Hi All,

I'm running into a couple of problems with the GridSQL coordinator.

1. The first issue is that the coordinator stops accepting connections and allows only one connection at a time. This issue returns an error to the psql client which exits gracefully.

2. The second issue is that the coordinator blackholes the psql/php connection attempt and doesn't return a status to the client.

This is what I see in the console.log around the time it happens.

2008-08-18 23:55:04,504 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBMessageMonitorException: Server has aborted execution, cause is: Timeout expired (3600000)
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlInsertTable.fillTempTable(Unknown Source)
at com.edb.gridsql.parser.SqlModifyTable.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

Both issues require a bounce of the coordinator and agents in series.

Thanks

Jon King



Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Hi Jon,

When these things happen, is it after a long time that the system has been up? Does it happen as a result of a malformed query or any other GridSQL error (that is, perhaps GridSQL does not handle some other error properly and as a result does not free connections)?

And just to make sure, xdb.maxconnections and xdb.default.threads.pool.maxsize (etc) are set to be more than 1?

When this happens, if you run "SHOW STATEMENTS;", does it show any other activity?

Also, which version are you using?

For 2), are you executing a particularly long query, one that may run more than one hour? The timeout is in milliseconds. If so, you can increase these value by overriding it by setting xdb.messagemonitor.timeout.millis to a higher number (try 36000000 (10 hours) or more). Perhaps this is related to (1), in that there are multiple long running queries that timeout that could not be cancelled properly on the underlying databases.

Sorry for the trouble.

Thanks,

Mason
[WWW]
Jon_K

Member

Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline

Hi Mason,

I am able to get the grid to stop responding when I perform the following query without a group by clause

SELECT silo_id, count(*) FROM user_fact;

Which results in the following console.log entry

2008-08-22 10:42:46,668 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBMessageMonitorException: Node 5 has aborted execution, cause is: java.sql.SQLException : ERROR: column "user_fact.silo_id" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlSelect.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "user_fact.silo_id" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
... 1 more
2008-08-22 10:42:46,672 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,672 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:1, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:1, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:8, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:8, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:3, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:3, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:6, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:6, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:4, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,682 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:4, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,684 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:7, To: [0], Session: 1, Request: 16]
2008-08-22 10:42:46,684 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:7, To: [0], Session: 1, Request: 16]

After this the console stops responding.

I'm running verson 1.0.


xdb.default.threads.pool.initsize=5
xdb.default.threads.pool.maxsize=10

The coordinator sits on a standalone server with 4 underlying physical nodes.

Here is my config file

###########################################################################
# Copyright (c) 2008 EnterpriseDB Corporation
#
# gridsql.config
#
# GridSQL configuration file
###########################################################################


###
### Server settings
###

xdb.port=6453
xdb.maxconnections=10


###
### Node & JDBC Pool configuration
###

### Set defaults for all nodes and MetaData database.
### These can be overriden.

xdb.default.dbusername=*****
xdb.default.dbpassword=*****

xdb.default.dbport=5432


### Connection thread defaults for each node
### Note that these are pooled, so the number of clients connected
### to GridSQL can be greater than pool size.

xdb.default.threads.pool.initsize=5
xdb.default.threads.pool.maxsize=10


### Connectivity for MetaData database

xdb.metadata.database=XDBSYS
xdb.metadata.dbhost=den3db025

### The number of nodes in cluster

xdb.nodecount=8

### The hosts of the underlying databases

xdb.node.1.dbhost=den3db025
xdb.node.2.dbhost=den3db025
xdb.node.3.dbhost=den3db026
xdb.node.4.dbhost=den3db026
xdb.node.5.dbhost=den3db027
xdb.node.6.dbhost=den3db027
xdb.node.7.dbhost=den3db028
xdb.node.8.dbhost=den3db028

### Designate coordinator node number
### In practice, the coordinator node should be the node where
### GridSQL is running.

xdb.coordinator.node=1

### The next few sections are required when wanting to run agents on the nodes.
### Uncomment them and modify to communicate with agents
###

### Only for agent version
### Port for node's SocketCommunicator

#xdb.node.1.port=6455
#xdb.node.1.host=den3db025
#xdb.node.2.port=6455
#xdb.node.2.host=den3db025
xdb.node.3.port=6455
xdb.node.3.host=den3db026
xdb.node.4.port=6456
xdb.node.4.host=den3db026
xdb.node.5.port=6455
xdb.node.5.host=den3db027
xdb.node.6.port=6456
xdb.node.6.host=den3db027
xdb.node.7.port=6455
xdb.node.7.host=den3db028
xdb.node.8.port=6456
xdb.node.8.host=den3db028

### Designate coordinator node
### In practice, the coordinator node should be the node where
### GridSQL is running.

xdb.coordinator.host=den3dbprx01
xdb.coordinator.port=6454

# Specify protocol types.
# Can use local connection between coordinator and node 1,
# since they are the same system

#xdb.connector.0.1=0
#xdb.connector.1.0=0
#xdb.connector.1.2=0
#xdb.connector.2.1=0
Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Thanks for the info.

It sounds like we have two issues to investigate. First, in the event of a problem, that it is handled gracefully (pooled connections cleaned/freed), an area we have previously spent time on and verified. Second, when doing semantic checks, catch the case when an expected GROUP BY is not present before we parallelize and send an invalid query to the backends.

Thanks,

Mason
[WWW]
Jon_K

Member

Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline

Does the fact that I have xdb.coordinator.node set to 1 when the coordinator doesn't really reside on that node raise any issues?

This message was edited 1 time. Last update was at 22 Aug 2008 13:59:41

Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Hi Jon,

xdb.coordinator.node can indeed be separate from where the coordinator process is running.

It really just indicates a preference for which underlying database instance to use in a couple of cases, like when a single consolidated intermediate table is created. This however, is a very rare occurrence, as nearly all steps try and make use of all available nodes, and if used these will usually contain a small number of rows.

Regards,

Mason
[WWW]
Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Hi Jon,

I just wanted to update you on this- we are having problems reproducing the connection issue. We do run into the same error without the GROUP BY of course, but GridSQL seems to handle the errors gracefully and it does not affect other sessions and new connections.

I am a bit concerned that we can’t reproduce the resulting connection problem when this error occurs, but we can at least make sure we check for GROUP BY properly.

Thanks,

Mason

[WWW]
Jon_K

Member

Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline

Mason,

Perhaps it is my system setup.

I'm going to move the coordinator to node 1 and see if that helps.

Jon

UPDATE:

1. Disabled all agents on the underlying nodes
2. Moved coordinator from dedicated server to node 1. Coordinator is reporting node errors correctly
3. Moved coordinator back to dedicated server. Coordinator is reporting node errors correctly

Looks like an agent problem. They're not reporting errors back from the nodes to the specified coordinator. Plus they don't reconnect when restarting the coordinator. :/

This message was edited 3 times. Last update was at 26 Aug 2008 13:33:59

Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Hi Jon,

Thank you for trying that out. Sorry again that we cannot reproduce the problem here.

From your config file, these lines were commented out:

#xdb.node.1.port=6455
#xdb.node.1.host=den3db025
#xdb.node.2.port=6455
#xdb.node.2.host=den3db025

If 1 and 2 were physically separate from the coordinator, then you probably do want to uncomment these and kick off an agent for these.

Still, the fact that these were connected to from the coordinator instead should be ok, just as you saw that everything was handled properly when all were done within the coordinator.

Is there anything of interest in the agent log files? Do you mind sending me the agent config files?

Thanks,

Mason

[WWW]
Jon_K

Member

Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline

Hi Mason,

Thanks again for helping with this.

Here is the config file for Physical Node 2, which houses databases usersN3 and usersN4.

##########################################################################
# Copyright (c) EnterpriseDB Corporation
#
# gridsql_agent.config
#
# This configuration file is used only when agents are running on
# the non-coordinator nodes.
##########################################################################

###
### The coordinator host and port
###

xdb.coordinator.host=den3db025
xdb.coordinator.port=6454


###
### Logging settings
###

log4j.rootLogger=WARN, console

log4j.logger.Server=ALL, console

# A1 is set to be a ConsoleAppender.
log4j.appender.console=org.apache.log4j.RollingFileAppender

# A1 uses PatternLayout.
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%r [%t] %-5p %c %x - %m%n
log4j.appender.console.File=/usr/local/gridsql-agent-1.0/log/agent.log


Nothing interesting in the agent log files, just the connections.

2655488 [Thread-2] INFO Server - Node 3: thread pool is ready
2655488 [Thread-2] INFO Server - Node 3: thread pool is ready
2655489 [Thread-2] WARN com.edb.gridsql.communication.NodeAgent - Node Thread Pool was initialized twice on node 3
2655489 [Thread-2] INFO Server - Node 3: Connection to Coordinator is completed
2655489 [Thread-2] INFO Server - Node 3: Connection to Coordinator is completed
2655846 [pool-1-thread-7] INFO Server - Node 3: connection pool for database users is ready
2655846 [pool-1-thread-7] INFO Server - Node 3: connection pool for database users is ready


Mason_S

Senior member

Joined: 1 Apr 2008 09:03:08
Messages: 380
Offline

Hi Join,

Please try adding these lines:

xdb.node.3.port=6455
xdb.node.4.port=6456

to gridsql_agent.conf on the system that hosts nodes 3 and 4, and make similar changes for the other agents.

On startup, the agent's connector starts listening on specified port. If this is not present, the connector does not listen for incoming connections and the coordinator won't be able to connect. Still, on startup the agent tries to connect to the coordinator and if successful, the coordinator will use this existing connection to send messages to the node. If the connection is broken because of coordinator is restarted or due to a network problem coordinator won't be able to restore the connection and node has to be restarted too.

I am not sure that this will solve the issue, but it is good to do anyway, and it was one difference in the configuration that the engineer had when trying to reproduce the problem.

Thanks,

Mason


[WWW]
Jon_K

Member

Joined: 19 Jun 2008 13:00:13
Messages: 28
Offline

Hi Mason,

Still no love from the agents to the coordinator. I have disabled the agents on the underlying nodes and am running the coordinator without them for the time being for stability.

I have now gone back to using two machines to test from scratch.

here is what I've found so far. Sorry, this may get long winded.

Node1 has the coordinator and database "test" Partition 1
Node2 has "test" Partition2


create table usr_fact(
user_id int,
gender varchar(1)
) PARTITIONING KEY user_id ON ALL;

INSERT INTO usr_fact(1,'M');
INSERT INTO usr_fact(2,'F');
INSERT INTO usr_fact(3,'F');
INSERT INTO usr_fact(4,'M');


I set up just the coordinator and configured it to connect directly to the postgres instances on 5432 on node1 and node2.

### Connectivity for MetaData database

xdb.metadata.database=XDBSYS
xdb.metadata.dbhost=node1

### The number of nodes in cluster

xdb.nodecount=2

### The hosts of the underlying databases

xdb.node.1.dbhost=node1
xdb.node.2.dbhost=node2
#xdb.node.3.dbhost=127.0.0.1
#xdb.node.4.dbhost=127.0.0.1


### Designate coordinator node number
### In practice, the coordinator node should be the node where
### GridSQL is running.

xdb.coordinator.node=1


console.log says:

2008-08-27 11:42:31,054 - INFO Coordinator: Node 2 is connected
2008-08-27 11:42:31,054 - INFO Coordinator: Node 2 is connected
2008-08-27 11:42:31,056 - INFO Coordinator: Node 1 is connected
2008-08-27 11:42:31,056 - INFO Coordinator: Node 1 is connected
2008-08-27 11:42:31,065 - INFO Node 2: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 2: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 1: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 1: thread pool is ready
2008-08-27 11:42:31,065 - INFO Node 2: Connection to Coordinator is completed
2008-08-27 11:42:31,065 - INFO Node 2: Connection to Coordinator is completed
2008-08-27 11:42:31,065 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:42:31,065 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:42:31,181 - INFO Node 2: connection pool for database test is ready
2008-08-27 11:42:31,181 - INFO Node 2: connection pool for database test is ready
2008-08-27 11:42:31,184 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:42:31,184 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:42:31,184 - INFO *** Database test is now online
2008-08-27 11:42:31,184 - INFO *** Database test is now online


I then installed the agent via rpm downloaded this morning from the sourceforge.net repository on node2.

rpm -ivh gridsql-agent-1.0-0.noarch.rpm


Changed the config file on node2 for gridsql_agent.config to:

xdb.coordinator.host=node1


Started up the agent by executing the following from the command line

sudo /usr/local/gridsql-agent-1.0/bin/gs-agent.sh -n 2


Changed the following lines in gridsql.config to reflect that I now have an agent running on node2.


### Only for agent version
### Port for node's SocketCommunicator

xdb.node.1.port=6455
xdb.node.1.host=node1
xdb.node.2.port=6455
xdb.node.2.host=node2
#xdb.node.3.port=6455
#xdb.node.3.host=192.168.123.102
#xdb.node.4.port=6455
#xdb.node.4.host=192.168.123.103


### Designate coordinator node
### In practice, the coordinator node should be the node where
### GridSQL is running.

xdb.coordinator.host=node1
xdb.coordinator.port=6454


I shutdown and restart the coordinator using the new config file.


sudo /usr/local/gridsql-1.0/bin/gs-shutdown.sh -f -u <user> -p <password>

sudo /usr/local/gridsql-1.0/bin/gs-server.sh -d test


console.log says:

2008-08-27 11:45:19,354 - INFO Coordinator: Node 1 is connected
2008-08-27 11:45:19,354 - INFO Coordinator: Node 1 is connected
2008-08-27 11:45:19,368 - INFO Node 1: thread pool is ready
2008-08-27 11:45:19,368 - INFO Node 1: thread pool is ready
2008-08-27 11:45:19,369 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:45:19,369 - INFO Node 1: Connection to Coordinator is completed
2008-08-27 11:45:19,380 - INFO Coordinator: Node 2 is connected
2008-08-27 11:45:19,380 - INFO Coordinator: Node 2 is connected
2008-08-27 11:45:19,434 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:45:19,434 - INFO Node 1: connection pool for database test is ready
2008-08-27 11:45:19,495 - INFO *** Database test is now online
2008-08-27 11:45:19,495 - INFO *** Database test is now online


From here, I am able to query the database tables as long as I don't do something stupid which will cause the underlying node to throw and error.

This works and returns a result
SELECT gender, count(0) FROM usr_fact GROUP BY 1 order by 1


This causes the underlying node to thrown an error which never makes it back to the coordinator or the psql screen.

SELECT gender, count(0) FROM usr_fact ORDER BY 1


At this point, psql looks like it is trying to fulfill the query, but the console.log shows this:


2008-08-27 12:04:20,204 - ERROR Catching throwable:
java.sql.SQLException: ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2306)
at com.edb.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1931)
at com.edb.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:643)
at com.edb.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:476)
at com.edb.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:390)
at com.edb.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:251)
at com.edb.gridsql.engine.NodeProducerThread.executeQuery(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,204 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,204 - ERROR Catching throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,205 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,206 - ERROR Catching throwable:
com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,206 - ERROR Throwing throwable:
com.edb.gridsql.exception.XDBMessageMonitorException: Node 1 has aborted execution, cause is: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlSelect.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
... 1 more
2008-08-27 12:04:20,206 - ERROR Catching throwable:
java.nio.channels.NotYetConnectedException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:114)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:139)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.read(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.receive(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractReceivingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,208 - ERROR Catching throwable:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:512)
at com.edb.gridsql.communication.SocketConnector$SendingThread.send(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,208 - ERROR Catching throwable:
java.nio.channels.NotYetConnectedException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:114)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:139)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.read(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.receive(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractReceivingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,210 - ERROR Catching throwable:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:512)
at com.edb.gridsql.communication.SocketConnector$SendingThread.send(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$SendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,210 - ERROR Catching throwable:
java.nio.channels.NotYetConnectedException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:114)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:139)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.read(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$ReceivingThread.receive(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractReceivingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,212 - ERROR Catching throwable:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:512)
at com.edb.gridsql.communication.SocketConnector$SendingThread.send(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.SocketConnector$SendingThread.sendFailed(Unknown Source)
at com.edb.gridsql.communication.AbstractConnector$AbstractSendingThread.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
2008-08-27 12:04:20,212 - ERROR Can not deliver message, sending MSG_ABORT back to Agent
2008-08-27 12:04:20,212 - ERROR Can not deliver message, sending MSG_ABORT back to Agent
2008-08-27 12:04:20,292 - ERROR Catching throwable:
com.edb.gridsql.exception.XDBServerException: Failed To Get Results For ( SQL , NodeURL) : ( SELECT "usr_fact"."gender" AS "XCOL1",count(*) AS "XCOL2" FROM "usr_fact" ) eQS Node 1 has aborted execution, cause is: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryStep(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.executeQueryExecPlan(Unknown Source)
at com.edb.gridsql.queryproc.QueryProcessor.execute(Unknown Source)
at com.edb.gridsql.parser.SqlSelect.execute(Unknown Source)
at com.edb.gridsql.engine.ExecutableRequest.execute(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.executeRequest(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.execute(Unknown Source)
at com.edb.gridsql.engine.ServerStatement.describe(Unknown Source)
at com.edb.gridsql.engine.XDBSessionContext.describeStatement(Unknown Source)
at com.edb.gridsql.protocol.PgProtocolSession.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.edb.gridsql.exception.XDBMessageMonitorException: Node 1 has aborted execution, cause is: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.MessageMonitor.checkMessages(Unknown Source)
at com.edb.gridsql.engine.MultinodeExecutor.executeStep(Unknown Source)
... 13 more
Caused by: com.edb.gridsql.exception.XDBWrappedSQLException: java.sql.SQLException : ERROR: column "usr_fact.gender" must appear in the GROUP BY clause or be used in an aggregate function
at com.edb.gridsql.engine.NodeThread.handleSqlException(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.processStep(Unknown Source)
at com.edb.gridsql.engine.NodeProducerThread.run(Unknown Source)
... 1 more
2008-08-27 12:04:20,292 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 2, Request: 9]
2008-08-27 12:04:20,292 - WARN Message was not consumed: NodeMessage[Type:MSG_ABORT, From:2, To: [0], Session: 2, Request: 9]


At this point, psql is completely useless.

Ctl+C spits out "Cancel request sent" every time I hit it.

The only way out is Ctl+Z.

At this point, I can psql back into the coordinator.

psql -U <user> -p 6453 -d test -h localhost


From here, I try to run a query

select count(*) FROM usr_fact


but, psql stops responding again.

I then have to kill the java processes on both nodes.

killall java


And restart

sudo /usr/local/gridsql-1.0/bin/gs-server.sh -d test

sudo /usr/local/gridsql-agent-1.0/bin/gs-agent.sh -n 2


Just to summarize:

  • I'm able to run the grid without the agents and have it report back errors from the underlying nodes without incident.

  • Adding in the agent causes node errors not to be reported back to psql, which locks up the gridsql system.


  • Sorry for the long string of steps and thanks for taking the time to look into this.

    Jon
    Mason_S

    Senior member

    Joined: 1 Apr 2008 09:03:08
    Messages: 380
    Offline


    Hi Jon,

    We are still having problems reproducing the problem.

    If it is not too much trouble can you set the log level to DEBUG (instead of INFO), on both the server and agent and retry to get us more logging output?

    Thanks,

    Mason
    [WWW]
    Mason_S

    Senior member

    Joined: 1 Apr 2008 09:03:08
    Messages: 380
    Offline

    Jon,

    Just letting you know that we did at least address detecting the missing GROUP BY sooner:

    http://forums.enterprisedb.com/posts/list/1422.page

    Thanks,

    Mason
    [WWW]
    Mason_S

    Senior member

    Joined: 1 Apr 2008 09:03:08
    Messages: 380
    Offline

    Hi Jon,

    Someone here was able to reproduce the problem after all. He said that the problem disappeared when he added the node ids to agent config files, as discussed earlier in this thread.

    Thanks,

    Mason
    [WWW]
    Mason_S

    Senior member

    Joined: 1 Apr 2008 09:03:08
    Messages: 380
    Offline

    A clarification to the previous statement- the problem disappears when adding port information for the nodes in the agent config file.

    Thanks,

    Mason
    [WWW]
    null

    Member

    Joined: 2 Apr 2008 17:21:41
    Messages: 78
    Offline

    After install the gridsql 2.0 and config it, then start the gs-server.sh , then start gs-dbstart.sh -d test.

    always create an error 'can not bring db test online'
    Mason_S

    Senior member

    Joined: 1 Apr 2008 09:03:08
    Messages: 380
    Offline

    Did you first run gs-createmddb.sh before starting?

    Also, before using the database "test" you must first create it with gs-createdb.sh.

    Regards,

    Mason
    [WWW]
     
    Forum Index » GridSQL - General
    Go to:   
    Powered by JForum 2.1.8 © JForum Team