Return entire XML document & doc-uri with embedded triples using Sparql

xmldocument
xmldocument javascript
javascript parse xml string
how to retrieve data from xml file using javascript
xpath
xml dom
getelementsbytagname
domparser

I have some XML documents with embedded triples.

I can run a sem:sparql query with no issues, but I'm not sure how to return the entire XML document that the embedded triples are in along with the document URI with the results. Thanks in advance

Does anyone know how to do this ?

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"  at "/MarkLogic/semantics.xqy";

let $res := sem:sparql('
  SELECT ?country
  WHERE {
    <http://example.org/news/Nixon> <http://example.org/wentTo> ?country
  }
  ',
  (),
  (),
  cts:and-query( (

    cts:or-query( (
      cts:element-value-query( xs:QName("source"), "AP Newswire" ),
      cts:element-value-query( xs:QName("source"), "BBC" )
   ) )
 ) )
)


return $res;

There may be a cleaner way (with less casting) but this is how I have done it before in XQuery :

xquery version "1.0-ml"; 

import module namespace sem = "http://marklogic.com/semantics" 
      at "/MarkLogic/semantics.xqy";

let $triples := sem:sparql('
SELECT *
WHERE
{ ?subject ?predicate ?object }
')

return sem:database-nodes($triples ! sem:triple(map:get(., "subject"), map:get(., "predicate"), map:get(., "object"))) ! fn:base-uri(.)

XML DOM - Document object, The Document object represents the entire XML document. for the document. doctype, Returns the Document Type Declaration associated with the document. I have tried and failed to find out how to get the entire XML string from the XMLDocument returned by a GET. There are a lot of questions on SO on how to find or replace specific elements in the ob

I solved this myself by doing the it this way. I hope this helps someone else. It took me a solid 5+ hours of reading through MarkLogic documentation. I think I have a good handle on it all now. I'm not sure this is the fastest way, but it only took 93ms

for $id in  sem:query-results-serialize(  $rez  , 'xml')//s:uri/text()
          return   
  xdmp:node-uri( (cts:search(/blah//sem:triple/*[text() = $id ]  , ()))[1]    ) 

Parsing and serializing XML, Constructs a DOM tree by parsing a string containing XML, returning a XMLDocument or Document as appropriate  Pieces of or entire XML documents can be queried and retrieved using these methods. Queries can return fragments or entire XML documents, and results returned from queries can be limited by using predicates. Because queries on XML data return XML sequences, a query's result can be used in the construction of XML data as well.

Consider using cts:triple-range-query. It can be used in cts:search directly, and blended into your other query. It allows looking for individual triples only, though your example is fairly simple.

To find all AP Newswire and BBC reports talking about Nixon visiting countries:

query version "1.0-ml";

for $doc in cts:search(
  collection(),
  cts:and-query( (
    cts:element-value-query( xs:QName("source"), ("AP Newswire", "BBC") ),
    cts:triple-range-query(
      sem:iri("http://example.org/news/Nixon"),
      sem:iri("http://example.org/wentTo"),
      ()
    )
  ) )
)[1 to 10]
return (xdmp:node-uri($doc), $doc)

You can also pre-execute a SPARQL to resolve more complex SPARQL queries first, and feed the result into a cts:search with a triple-range-query. For example: to find all reports about specific countries visited by Nixon (as reported by AP Newswire and BBC):

let $countries := sem:sparql('
  SELECT DISTINCT ?country
  WHERE {
    <http://example.org/news/Nixon> <http://example.org/wentTo> ?country
  }
  ',
  (),
  (),
  cts:element-value-query( xs:QName("source"), ("AP Newswire", "BBC") )
) ! map:get(., "country")

for $doc in cts:search(
  collection(),
  cts:triple-range-query(
    sem:iri("http://example.org/news/Nixon"),
    sem:iri("http://example.org/wentTo"),
    $country[1 to 3]
  )
)[1 to 10]
return (xdmp:node-uri($doc), $doc)

Note the subtle differences between the above two examples..

HTH!

Modifying Nodes, Content, and Values in an XML Document , Modifying Nodes, Content, and Values in an XML Document. 03/30/ Modify an entire set of nodes by replacing the nodes with new nodes. When using the ReplaceData and RemoveChild methods, the methods return the  The solution I found was to get the org.w3c.dom.Node with xpath (DOM would work too). Then I created a javax.xml.transform.dom.DOMSource from the node and transformed that to a string with javax.xml.transform.TransformerFactory.

XmlDocument.GetElementsByTagName Method (System.Xml , The special value "*" matches all tags. Returns. XmlNodeList. An XmlNodeList containing a list of all matching nodes. If no nodes match name , the returned  The Document object represents the entire XML document. The Document object is the root of an XML document tree, and gives us the primary access to the document's data. Since element nodes, text nodes, comments, processing instructions, etc. cannot exist outside the document, the Document object also contains methods to create these objects.

Returning an XML Encoded String in .NET, However, in this case I have a huge block of mostly static XML text and creating the entire document using structured XML documents seems  XML is a markup language created by the World Wide Web Consortium (W3C) to define a syntax for encoding documents that both humans and machines could read. It does this through the use of tags that define the structure of the document, as well as how the document should be stored and transported. It’s probably easiest to compare it to another

Read XML document and return Document Object Model node , This MATLAB function reads the specified XML file and returns DOMnode a Document Object Model node representing the document. The Select-Xml cmdlet lets you use XPath queries to search for text in XML strings and documents. Enter an XPath query, and use the Content, Path, or Xml parameter to specify the XML to be searched.

Comments
  • Interesting. This only works if you have the subject, predicate, and object though right ? If I select only the subject for example, I'm not sure it will be able to get the database nodes
  • @AriesOnTheCusp Yes, as far as I know you need a full sem:triple object for sem:database-nodes. However, it shouldn't be hard to build that since you have specified the missing subject and predicate in your query. Alternatively, you could include your document URI in your triples and link the other triples to it so you could figure this out in SPARQL.
  • It would be much more efficient to use cts:search(doc(), cts:element-value-query(xs:QName("sem:subject"), $id), 'unfiltered') . You could expand cts:element-value-query to a cts:or-query to cover subject, predicate and object if needed.
  • I thought using some xpath expression as the 1st parameter would be faster, since I'm specifying the document root node (filtering the set of possible documents). Is that incorrect ?
  • Specifying the document root node can potentially be quick, but mostly when it rules out most of the documents. // is actually expensive in most cases. XPath could also forces filtering of results to apply document ordering, though not entirely sure if that is also the case with search paths. There are usually many factors that influence performance. Best is to measure, and keep measuring.
  • Thanks, my SPARQL queries are more complex than the one I gave so I can try your pre-executed query example. It seems like MarkLogic would already have the document when its doing the cts:element-value-query() in the SPARQL query, so an additional cts:search seems odd to me. I wish there was a way to extract the node-uri when it already has the doc
  • There is some doc awareness during SPARQL, but since the same triple can originate from multiple documents, doc awareness is lost in the sem bindings that are returned. Hence the double pass, but since it resolves from indexes, you hardly notice..
  • Small addition to my last comment: cts:element-value-query is not resolved by pulling up the actual document, it is resolved from the so-called Universal Index. MarkLogic tries to resolve as much as possible from indexes, which contain references to document fragments. The actual documents are not retrieved until data within is used or returned. That is postponed until the [1 to 10] bit in my examples..