XQuery result changed

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

XQuery result changed

Masakazu Ichimura
Hi

I upgrated eXist-snapshot-20050805 to eXist-snapshot-20051026
and find XQuery result changed.

I store /db/sample.xml and execute same XQuery in two snapshots.
but, get different results.

Which result should I expect?
And, can I get result without <exist:match> node in snapshot-20051026?

===== /db/sample.xml =====
<Root>
    <Test Name="test1">
        <Comment>foo</Comment>
    </Test>
    <Test Name="test2">
        <Comment>bar</Comment>
    </Test>
</Root>

===== XQuery =====
for $test in document("/db/sample.xml")/Root/Test
where $test/Comment = 'foo'
return $test

===== snapshot-20050805 result =====
<Test Name="test1">
    <Comment>foo</Comment>
</Test>

===== snapshot-20051026 result =====
<Test Name="test1">
    <Comment>
        <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">foo</exist:match>
    </Comment>
</Test>

Thanks,
--masakazu.



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

wolfgangmm
Hi,

> And, can I get result without <exist:match> node in snapshot-20051026?

When using the fulltext index, the exist:match nodes can be inserted
by the serializer to mark those parts of the string value that
triggered a fulltext match. Sometimes, the query engine will also use
the fulltext index to speed up a simple "=" comparison, so you're
getting the exist:match markers though you didn't use a fulltext
operation.

There are various ways to turn this feature on or off. Either globally
in conf.xml (set <serializer match-tagging-elements="no"/>) or in the
query prolog:

declare option exist:serialize "highlight-matches=none";

Wolfgang


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

Pierrick Brihaye-2
Hi,

Wolfgang Meier wrote:

> When using the fulltext index, the exist:match nodes can be inserted
> by the serializer to mark those parts of the string value that
> triggered a fulltext match. Sometimes, the query engine will also use
> the fulltext index to speed up a simple "=" comparison, so you're
> getting the exist:match markers though you didn't use a fulltext
> operation.
>
> There are various ways to turn this feature on or off. Either globally
> in conf.xml (set <serializer match-tagging-elements="no"/>) or in the
> query prolog:
>
> declare option exist:serialize "highlight-matches=none";

Given this clear exclamation, I formulate an RFE :

would it be possible to have the reverse behaviour and replace these
"no" and "none" by "yes" ?

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire / DRAC Bretagne
mailto:[hidden email] / tél : +33 (0)2 99 29 67 78
http://usenet-fr.news.eu.org/fr-chartes/rfc1855.html. L'avez-vous lu ?


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

Michael Beddow-2
[Wolfgang]

> When using the fulltext index, the exist:match nodes can be inserted
> by the serializer to mark those parts of the string value that
> triggered a fulltext match. Sometimes, the query engine will also use
> the fulltext index to speed up a simple "=" comparison, so you're
> getting the exist:match markers though you didn't use a fulltext
> operation.
>
> There are various ways to turn this feature on or off. Either globally
> in conf.xml (set <serializer match-tagging-elements="no"/>) or in the
> query prolog:
>
> declare option exist:serialize "highlight-matches=none";

[Pierrick]

> Given this clear exclamation, I formulate an RFE :

> would it be possible to have the reverse behaviour and replace these
> "no" and "none" by "yes" ?


This prompts me to say again that I think the configuration mechanism for
this feature needs some tidying up. It is rather confusing to have the
quadruple "none/elements/attributes/both" option in the prolog settings,
alongside the binary "yes/no" for match-tagging-elements and
match-tagging-attributes in conf.xml. I understand the history behind this
divergence, but I think sooner or later it needs a rethink.

On the main issue -- match tagging on or off by default -- I personally am
in two (though fortunately not four) minds.  This was discussed here a
couple of years ago when I expressed some unease about the mechanism for
tagging attribute matches, which eXist does by inserting marker characters
into the returned values. Though it's hard to see how else it could be done,
I was unhappy about it as a default behaviour, because it arguably alters
infoset content in a way that could be tricky to identify or reverse. So
does marking of text content matches via <exist:match> elements, but since
the additions are namespaced, there are easy and standard techniques for
ignoring or removing the additions, even without using config options. ISTR
that the upshot of that debate was that, for a time at least, attribute
value match marking was turned off by default while element text content
match marking was left turned on.

My possibly somewhat cynical hunch is that leaving things that way (implying
"no" to turn element text content match tagging off, rather than "yes" to
turn it on) will reduce noise on this list. This is based on a guess that
"out of the box" and "quick-fix" deployers will want match tagging, and will
come straight on to this list asking how to enable it (or else asking
wide-eyed why eXist doesn't support it) if they don't get it by default.
Whereas those who have good reasons for not wanting match tagging will be
more likely to be sufficiently savvy to realise that it's highly likely to
be configurable,  and sufficiently considerate to RTFM or browse through
these archives and so find out how to configure it without needing to ask
here. But that does rather presuppose that the configuration mechanisms and
settings are made a bit more consistent than they are at present.

And I can see, too, that there's a problem in the OP instance here, where a
user has employed a standard comparison or matching operator or function,
where match tagging is not to be expected, but  eXist happens to have
optimised the operation via a fulltext index lookup. I'd say that, ideally,
match marking should happen only where the user has deliberately employed an
extension fulltext operator or function, and shouldn't occur where
transparent fallback to fulltext lookup has been performed internally by
eXist. Otherwise, the results may contain a confusing mixture of marked and
umarked matches.

Michael Beddow





-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

Pierrick Brihaye-2
Hi,

Michael Beddow wrote:

> alongside the binary "yes/no"

... "none" :-) ...

> for match-tagging-elements and
> match-tagging-attributes in conf.xml.

> My possibly somewhat cynical hunch is that leaving things that way (implying
> "no" to turn element text content match tagging off, rather than "yes" to
> turn it on) will reduce noise on this list. This is based on a guess that
> "out of the box" and "quick-fix" deployers will want match tagging,  and will
> come straight on to this list asking how to enable it (or else asking
> wide-eyed why eXist doesn't support it) if they don't get it by default.

Why not a mixed solution :

<!-- override default settings -->
<!-- as suggested by Michael and overwhelmingly accepted by all -->
<!-- see http://article.gmane.org/gmane.text.xml.exist/6848 -->
<serializer match-tagging-elements="yes"/>

in the *default* config file ?

> Whereas those who have good reasons for not wanting match tagging will be
> more likely to be sufficiently savvy to realise that it's highly likely to
> be configurable,

Personally, I am reluctant to modify the original XML without explicit
configurations options (i.e. with default ones).

I understand that the XSLT 2.0 and XQuery 1.0 Serialization methods
(http://www.w3.org/TR/xslt-xquery-serialization/) follow a similar
doctrin. Am I wrong ?

> But that does rather presuppose that the configuration mechanisms and
> settings are made a bit more consistent than they are at present.

Can we still afford inconsistencies there :-) ? More seriously, this is
indeed a crucial point which we're trying to address without breaking
anything (read : slowly :-).

However, we've made little progress on this topic : the server's config
has a schema now :
http://cvs.sourceforge.net/viewcvs.py/exist/eXist-1.0/server.xsd?rev=1.1&view=log

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire / DRAC Bretagne
mailto:[hidden email] / tél : +33 (0)2 99 29 67 78
http://usenet-fr.news.eu.org/fr-chartes/rfc1855.html. L'avez-vous lu ?


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

Pierrick Brihaye-2
In reply to this post by Michael Beddow-2
Hi,

Back on the other part of this topic.

Michael Beddow wrote:

> And I can see, too, that there's a problem in the OP instance here, where a
> user has employed a standard comparison or matching operator or function,
> where match tagging is not to be expected, but  eXist happens to have
> optimised the operation via a fulltext index lookup. I'd say that, ideally,
> match marking should happen only where the user has deliberately employed an
> extension fulltext operator or function, and shouldn't occur where
> transparent fallback to fulltext lookup has been performed internally by
> eXist.

Why the <exist:match> tagging should be restricted to full text matches
? I wish we could have one day a "universal" matcher that would tag the
result of any kind of Expression (provided that it is, of course, a
"location" in the XPointer sense of the term).

It may be too early to have a full discussion on this interesting topic
though.

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire / DRAC Bretagne
mailto:[hidden email] / tél : +33 (0)2 99 29 67 78
http://usenet-fr.news.eu.org/fr-chartes/rfc1855.html. L'avez-vous lu ?


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

Michael Beddow-2
[MB]

> And I can see, too, that there's a problem in the OP instance here, where
a
> user has employed a standard comparison or matching operator or function,
> where match tagging is not to be expected, but  eXist happens to have
> optimised the operation via a fulltext index lookup. I'd say that,
ideally,
> match marking should happen only where the user has deliberately employed
an
> extension fulltext operator or function, and shouldn't occur where
> transparent fallback to fulltext lookup has been performed internally by
> eXist.

[PB]

> Why the <exist:match> tagging should be restricted to full text matches
> ? I wish we could have one day a "universal" matcher that would tag the
> result of any kind of Expression (provided that it is, of course, a
> "location" in the XPointer sense of the term).

I would agree. What I think is undesirable is the present behaviour where if
(a) match tagging is turned on and (b) a user employs a standard expression
which eXist chooses for performance reasons to evaluate using the fulltext
index, then the resulting matches are tagged, whereas the results of other
standard expressions from the same user on the same data with the same
config don't get tagged.  By "ideally" I really meant "it would be a good
idea to fix this, but not at the expense of more pressing issues".

But I can see it would be better still, if, configurably, all matches could
be marked (and the mention to XPointer flashes up the notion that maybe this
could be done by returning a piece of metadata, indicating via stand-off
markup the location of the matched items in the actual results, and so
conveying that information to those who need it while leaving the results
themselves infoset-clean).

But somewhere in here is the distinction between those who are after Data
Retrieval and those who are doing Information Retrieval (OK: InfoSci has
moved on a lot since the dats when that distinction seemed perfectly clear,
but I still think its a useful one when thinking in broad terms). With DR, I
want the data I asked for, period. So for the retrieval system specially to
mark it as the data I asked for makes no sense. With IR, I want to know, in
the first instance, which documents in a repository may contain what I'm
querying for, and which are most likely to be worth inspecting first if
there are a lot of them. And then, if I ask to see the candidate documents,
I need a way of quickly spotting the places in those documents that led the
IR system to believe they were what I was looking for. So with IR, match
marking in some form is of the essence. Now fulltext searching and IR are
closely related, though not identical. And any enhancement of eXist's IR
capabilities is likely to lie along the route of fulltext features,
enhancing the current ones, bringing in newer, standard versions as the W3C
firms up the XQuery FT proposals. So for that reason I think there is a
"special relationship" between fulltext operations and match marking,
whereas for the things the (current) standard XQuery operators and functions
are designed for, I'd say match marking would be often a convenience but
seldom a necessity.

Michael Beddow



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|

Re: XQuery result changed

Pierrick Brihaye-2
Hi,

Michael Beddow wrote:

> What I think is undesirable is the present behaviour where if
> (a) match tagging is turned on and (b) a user employs a standard expression
> which eXist chooses for performance reasons to evaluate using the fulltext
> index, then the resulting matches are tagged, whereas the results of other
> standard expressions from the same user on the same data with the same
> config don't get tagged.

This isn't even worth to be discussed : it should be fixed :-)

> But I can see it would be better still, if, configurably, all matches could
> be marked (and the mention to XPointer flashes up the notion that maybe this
> could be done by returning a piece of metadata, indicating via stand-off
> markup the location of the matched items in the actual results, and so
> conveying that information to those who need it while leaving the results
> themselves infoset-clean).

I've thought about it. We could consider several serialisation modes for
the matcher.

[snipping the DR/IR discussion : I more or less agree with it]

> And any enhancement of eXist's IR
> capabilities is likely to lie along the route of fulltext features,

Mmmh... I will provide a counter-example which, I hope, will show that
eXist should consider *generic* API in this domain.

Spatial operators follow a similar logic : an spatial intersection is
similar to eXist's &= full text operator, a spatial union to the |= one,
a spatial buffer to the near() one, and match* *could* be considered as
spatial operations on MBRs (minimal bounding rectangles).

So, I would tend to consider that eXist needs 2 kinds of indexes : exact
ones (DR oriented) and "approximate" ones (IR oriented) together with a
set of API - including matchers - to cope with them. Of course, in some
cases, the approximate indexes may help exact searches just like  it
happened with Masakazu. This "help" may consist in post-filtering or
whatever, the ideal case being when there is nothing to filter off...

Regarding the nature of what an "approximation" should be, this should
be *highly* configurable IMHO : arabic fulltext is not to be handled in
the same way that anglo-norman one ;-) and MBR spatial indexes are
useless if you deal only with points.

Well... as usual all this is easy to say and hard to code. Going back to
code then... :-)

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire / DRAC Bretagne
mailto:[hidden email] / tél : +33 (0)2 99 29 67 78
http://usenet-fr.news.eu.org/fr-chartes/rfc1855.html. L'avez-vous lu ?


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open