-
Notifications
You must be signed in to change notification settings - Fork 662
GH-2701: Fuseki Mod to list and abort running executions. #3184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0f8a131
to
a5ca301
Compare
Really interesting piece of work, once did something much cruder (at least UI wise) in a previous $dayjob
Yes I think this would be much cleaner if the tracking mechanism was integrated directly into the execution machinery without requiring extra wrapping as you do in this PR. It would be nice if there were programmatic APIs for interacting with tracked queries/updates (there's some pieces towards that here but appears mostly focused on exposing stuff to the UI from my skim-reading of the code) so that applications that embed Jena could access and manage tracked queries/updates as desired. Fuseki already has the concept of Tasks that's used for things like backups and compactions, would it make sense to integrate query/update tracking into that rather than creating a separate tracking mechanism. That might need generalising that mechanism, or pulling it more into Jena's core rather than Fuseki machinery, so might not be worth the effort, wdyt? |
The ARQ Plugin adds An important question is, whether tracking executions within the DatasetGraph's context is the way to move forward.
Yes, I yet need to look into how much effort it would be to disentangle Fuseki' task tracker from the Fuseki - but adding such a mechanism to core (and updating Fuseki for it) would be most likely the way to go. |
28ccc1b
to
603c8e7
Compare
0d4c658
to
61039ae
Compare
public class ChainingQueryDispatcherExecTracker | ||
implements ChainingQueryDispatcher | ||
{ | ||
@Override | ||
public QueryExec create(Query query, DatasetGraph dsg, Binding initialBinding, Context context, | ||
QueryDispatcher chain) { | ||
QueryExec delegate = chain.create(query, dsg, initialBinding, context); | ||
QueryExec result = TaskTrackerRegistry.track(context, delegate); | ||
TaskTrackerRegistry.remove(context); | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant code for intercepting query execution construction over a dataset graph using the newly proposed dispatcher system. Here, execution tracking is added.
The original code integrated exec tracking into the Update-/QueryEngineFactory system. However, this was sub-par because execution could only be intercepted on the QueryIterator level. My updated proposal is to introduce a new layer Update-/QueryDispatcher on top of the Update-/QueryEngineFactory machinery. Now it is possible to intercept any query / update request to a dataset - even without having to parse the query. Old design: QueryExecBuilderDataset -build-> QueryExecDataset -exec-> QueryEngineRegistry New design: QueryExecBuilderDataset -build-> QueryDispatcherRegistry -customized build-> QueryExec The last element of the dispatcher chain forwards the request to the usual Update-/QueryEngineFactory system. Consequences:
Related Ongoing WorkAs a demo for related work based on this infrastructure, we are using it to integrate third party triple stores - such as Qlever - into Fuseki. This way we can use one server framework to manage several triple stores. As a final note, we also already created an assembler that starts qlever from a docker image as part of Fuseki (via the Java TestContainer framework), so the configuration looks like this: <#baseDS> rdf:type qlever:Dataset ;
qlever:location "/run/databases/qlever/mydb/" ;
qlever:indexName "myindex" ;
qlever:accessToken "abcde" ;
qlever:memoryMaxSize "4G" ;
qlever:defaultQueryTimeout "600s" ; |
1833ad7
to
7f59fa7
Compare
91e9758
to
c107201
Compare
The PR should be fairly complete and cleaned up now such that it is ready for review. I updated the first post with a summary of the changes. |
@@ -355,7 +354,9 @@ public default boolean queryAsk(Query query) { | |||
* @return QueryExecution | |||
*/ | |||
@Override | |||
public QueryExec query(Query query); | |||
default public QueryExec query(Query query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated RDFLink's default shortcuts to delegate to newQuery().query(...).build()
and newUpdate().update(...).build()
so that any processing that happens in builders is applied by default.
a4857d0
to
7a4fe8c
Compare
This is "quite big". Is there a best place to start reviewing to break up the feedback? |
@@ -34,7 +34,7 @@ | |||
* This class provides the Jena Graph interface to a remote SPARQL endpoint. | |||
* Efficiency not guaranteed. | |||
*/ | |||
|
|||
@Deprecated(forRemoval = true) // Superseded by DatasetGraphOverSparql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just delete this class!
The fundamental changes in
The changes in
The pivotal question is, whether abstraction of remote endpoints should be done via DatasetGraph in the first place. |
public class ExampleDBpediaViaRemoteDataset { | ||
public static void main(String... argv) { | ||
// The query string is sent to the DBpedia endpoint as is (without parsing). | ||
// By default, Jena would fail to parse it because of the undeclared prefixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is possible ....
QueryExecHTTPBuilder qExecB = (QueryExecHTTPBuilder)link.newQuery();
QueryExec qExec = qExecB.queryString(queryString).build();
Only the HTTP variants have the queryString
option.
That could be exposed, defaulting to query(String)
for non-HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that in this example the query goes via QueryExec.dataset(dsg)
- at this level the machinery does not see any HTTP involvement. By default it makes sense to apply parse check immediately when a conventional Jena dataset is involved.
DatasetGraphOverSparql.initContext()
sets the newly introduced ARQConstants.parseCheck=false
for its context. This flag is picked up by QueryExecBuilderDataset
.
Calling QueryExecBuilder.build()
then forwards the query string to SparqlDispatcherRegistry
.
ChainingQueryDispatcherForDatasetGraphOverRDFLink.java
handles DatasetGraphOverRDFLink
and forwards the query to the RDFLink
.
The handler is registered to the global SparqlDispatcherRegistry
in jena-rdfconnection
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default it makes sense to apply parse check immediately when a conventional Jena dataset is involved.
Right - QueryExec.queryString
could default to calling QueryExec.query
, while the remote case has bypass behaviour.
At sometime (!!) maybe there'll be different local query engines.
import org.apache.jena.sparql.exec.RowSet; | ||
|
||
/** TODO Get rid of dependency to external service. */ | ||
public class TestGraphOverRDFLink { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be removed - it precedes ExampleDBpediaViaRemoteDataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to remove it.
* The operation must already be registered with the builder. | ||
* @see #registerOperation(Operation, ActionService) | ||
*/ | ||
public Builder addEndpoint(String datasetName, String endpointName, Operation operation, AuthPolicy authPolicy, Context context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addEndpoint with a custom context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if these are a bit scattered. I'm trying to work my way through this valuable new feature.
* @param httpRequest | ||
* @return HttpResponse | ||
*/ | ||
public static CompletableFuture<HttpResponse<InputStream>> executeJDKAsync(HttpClient httpClient, HttpRequest httpRequest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this.
executeJDK*
is a system primitive and having one operation (+ non-async) makes that clearer IMO.
import org.apache.jena.http.HttpLib; | ||
import org.apache.jena.riot.web.HttpNames; | ||
import org.apache.jena.web.HttpSC; | ||
|
||
public class AuthLib { | ||
/** @see #authExecuteAsync */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put in the javadoc, tweaking it with
Call {@link HttpClient} after applying an active {@link AuthRequestModifier}
@@ -59,7 +59,8 @@ private Endpoint(Operation operation, ValidString endpointName, AuthPolicy reque | |||
// Canonicalise to "" for dataset-level operations. | |||
this.endpointName = endpointName==null? DatasetEP : endpointName; | |||
this.authPolicy = requestAuth; | |||
this.context = context; | |||
// Always provide a context so that e.g. fmods can store data. | |||
this.context = context != null ? context : Context.create() ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoints come from Endpoint.Builder - e.g. FusekiServer.serviceEndpointOperation - this decision could be in the builder.
import org.apache.jena.sparql.exec.RowSet; | ||
|
||
/** TODO Get rid of dependency to external service. */ | ||
public class TestGraphOverRDFLink { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to remove it.
public class ExampleDBpediaViaRemoteDataset { | ||
public static void main(String... argv) { | ||
// The query string is sent to the DBpedia endpoint as is (without parsing). | ||
// By default, Jena would fail to parse it because of the undeclared prefixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default it makes sense to apply parse check immediately when a conventional Jena dataset is involved.
Right - QueryExec.queryString
could default to calling QueryExec.query
, while the remote case has bypass behaviour.
At sometime (!!) maybe there'll be different local query engines.
@@ -44,7 +45,7 @@ public abstract class ExecHTTPBuilder<X, Y> { | |||
protected String serviceURL = null; | |||
private Query query = null; | |||
protected String queryString = null; | |||
protected boolean parseCheck = true; | |||
protected Boolean parseCheck = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
?
import org.apache.jena.sparql.exec.QueryExec; | ||
import org.apache.jena.sparql.util.Context; | ||
|
||
public interface ChainingQueryDispatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanatory javadoc, please.
import org.apache.jena.sparql.util.Context; | ||
import org.apache.jena.update.UpdateRequest; | ||
|
||
public interface ChainingUpdateDispatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanatory javadoc, please.
jena-arq/src/main/java/org/apache/jena/sparql/engine/http/QueryExceptionHTTP.java
Outdated
Show resolved
Hide resolved
@@ -70,6 +71,17 @@ public UpdateProcessorBase(UpdateRequest request, | |||
} | |||
} | |||
|
|||
@Override | |||
public UpdateRequest getUpdateRequest() { | |||
// XXX Return a copy for safety? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decide on the contract!
<dependency> | ||
<groupId>org.junit.jupiter</groupId> | ||
<artifactId>junit-jupiter-params</artifactId> | ||
<scope>test</scope> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed because it is included with artifact junit-jupiter
.
<dependency> | ||
<groupId>org.seleniumhq.selenium</groupId> | ||
<artifactId>selenium-java</artifactId> | ||
<version>4.18.1</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version managed in the top POM. It is currently 4.35.0.
<dependency> | ||
<groupId>io.github.bonigarcia</groupId> | ||
<artifactId>webdrivermanager</artifactId> | ||
<version>5.8.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version managed by top POM - it is currently 6.2.0.
<dependency> | ||
<groupId>org.junit.vintage</groupId> | ||
<artifactId>junit-vintage-engine</artifactId> | ||
<scope>test</scope> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed (changes in TestFMod_ExecTracker)
* Although, a headless Chrome should be started automatically, | ||
* this step turns out to not yet work reliable across all environments. | ||
*/ | ||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ignore | |
@Disabled |
return graph; | ||
} | ||
|
||
@Before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Before | |
@BeforeEach |
js.executeScript("window.lastEvent = null"); | ||
} | ||
|
||
@After |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@After | |
@AfterEach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I position @AfterEach
next to @BeforeEach
.
import org.junit.After; | ||
import org.junit.Before; | ||
import org.junit.Ignore; | ||
import org.junit.Test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUjit4 -> Junit5:
import org.junit.After; | |
import org.junit.Before; | |
import org.junit.Ignore; | |
import org.junit.Test; | |
import org.junit.jupiter.api.AfterEach; | |
import org.junit.jupiter.api.BeforeEach; | |
import org.junit.jupiter.api.Disabled; | |
import org.junit.jupiter.api.Test; |
import org.apache.jena.sparql.exec.QueryExec; | ||
import org.apache.jena.sparql.util.Context; | ||
|
||
public interface QueryDispatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document with some javadoc.
import org.apache.jena.sparql.util.Context; | ||
import org.apache.jena.update.UpdateRequest; | ||
|
||
public interface UpdateDispatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document with some javadoc.
|
||
@Override | ||
public void end() { | ||
// Note: AbstractTestRDFConnection.transaction_bad_01() expects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment.
|
||
/** | ||
* This class provides a base implementation of the Jena DatasetGraph interface | ||
* to a remote SPARQL endpoint. Efficiency not guaranteed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything making this remote-specific.
It would be weird normally ... maybe for logging/mocking for develoment and testing?
Maybe this class should be alongside the other DatasetGraph
.
public boolean supportsTransactions() { | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, please add
public boolean supportsTransactionAbort() {
return false;
}
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.jena.fuseki.mod.exec.tracker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be exectracker
?
What else would be org.apache.jena.fuseki.mod.exec
?
} | ||
} | ||
} | ||
protected void serveEvents(HttpAction action) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ("?command=events")is showing up as (-1, -1) in the log. Maybe this will help:
action.setResponseStatus(HttpSC.OK_200);
It would also benefit from a try-catch
just in case an internal error happens.
import jakarta.servlet.http.HttpServletResponse; | ||
|
||
/** | ||
* REST action handler for listing running query executions and stopping them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some javadoc that describe the commands supported.
The fuseki:operation fuseki:tracker
will also need documentation for the website.
<h1>Ongoing Executions</h1> | ||
<table id="runningTasksTable"> | ||
<thead> | ||
<tr><th>ID</th><th>Start Time</th><th>Label</th><th>Action</th></tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a "Label".
I am seeing the whole query and I'm using short queries.
Is there a way to set the display?
Maybe - strip prefixes, use an abbreviated string of the query.
My first reaction on seeing the query string was that clicking would run the query. Saying "See Details" or some such would be clearer.
|
||
public class VocabAssemblerHTTP | ||
{ | ||
private static final String NS = "http://jena.apache.org/2025/http#"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a new namespace? Are there name clashes?
(it is the usually problem of large namespaces vs remembering several namespaces)
jena-arq/src/main/java/org/apache/jena/sparql/engine/http/QueryExceptionHTTP.java
Outdated
Show resolved
Hide resolved
import org.apache.jena.sparql.engine.dispatch.QueryDispatcher; | ||
import org.apache.jena.sparql.util.Context; | ||
|
||
public class ChainingQueryDispatcherMain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanatory javadoc, please.
"git rebase main" works with one update for some Junit4 code. |
f91303c
to
df88500
Compare
GitHub issue resolved #2701
Pull request Description: ARQ plugin + Fuseki Plugin to track and abort ongoing SPARQL executions.
I tried to design the changes to jena's core in such a way that execution tracking can be enabled without requiring any changes to existing code.
Summary of changes.
Interception of SPARQL Requests and Execution Tracking
Changes in
jena-arq
:SparqlDispatcherRegistry
+ infrastructure which allows to intercept SPARQL update and query statements (as objects or strings) against DatasetGraphs. Update-/QueryExecDataset now first delegates to the dispatcher chain.InitExecTracking
. If in the dispatcher chain there is aTaskListener
is the context, then the Update-/QueryExec instances are wrapped with a tracking wrapping such that the listener is notified.TaskListener
implementation isTaskEventBroker
which supports de-/registering of further listeners.n
executions, there isTaskEventHistory
which is a subclass ofTaskEventBroker
.parseCheck
dataset context attribute. If false, then update-/query requests are forwarded via the dispatcher without parsing.RDFLink-based Execution Tracking
This adds infrastructure to
jena-rdfconnection
in order to track executions against RDFLinks via the newly introduced classDatasetGraphOverRDFLink
.DatasetGraphOverSparql
. This is a base class injena-arq
that implements all methods by means of SPARQL requests. It is wired up with the tests inTS_SparqlCore
. Caveat: As each update is a separate request and bulk updates may be split into multiple requests, blank nodes may not work as expected.DatasetGraphOverRDFLink
as a subclass ofDatasetGraphOverSparql
which provides anewRDFLink()
method and implements all DatasetGraph methods based on the RDFLink.DatasetGraphOverRDFLink
and delegates them to the RDFLink.DatasetAssemblerHTTP
which configures aDatasetGraphOverRDFLink
instance, which allows use of this system with Fuseki.ExampleDBpediaViaRemoteDataset
which demonstrates a query against such a dataset making use of virtuoso specific features.Fuseki Mod: Execution Tracker
Added a simple web page that will show a live view of ongoing and completed queries.
TaskEventHistory
in the endpoint context and connect it to the dataset context'sTaskEventBroker
(broker will be created if absent).Jena-Query-Dashboard.webm
Misc changes
QueryExecHTTP
: Improved support to cancel HTTP requests. So far cancel would hang until the InputStream of the HTTP response could be obtained. Now the HTTP request can be cancelled immediately.jena-geosparql
with that of the execution tracking.By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.