Breaking binary files into chunks

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Breaking binary files into chunks

nsincaglia
There are a number of REST cloud storage APIs that require one to use multipart uploads (POST or PUT) once the file size exceeds a certain amount.

Is there a function within eXist-db or a external module that contains a convenience function which will allow me to easily extract the specified number of bytes from a binary file to enable me to upload a file using these REST cloud storage APIs using subsection chunks of a binary file?

If there is no convenient function or module, does anyone have a suggestion on the easiest way to extract, say, a set of 64 MB chunks of data from a binary file?

Best,

Nick



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

nsincaglia
To further comment on this post, I know that the Expath File Module contains functions that perform the file splitting operations I am looking for. Am I correct is saying that this module is not currently configured to be installed as a module into eXist-db at the moment?

If I am correct in that statement, I am curious, how do people perform multipart posts with binary files? Can anyone share how they do it or recommend a preferred way of going about this?

Nick
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

Alister Pillow-3
Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment?

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

nsincaglia
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment?

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

Alister Pillow-3
Hi Nick,
I was responding to this question...


>>> how do people perform
>>> multipart posts with binary files? Can anyone share how they do it or
>>> recommend a preferred way of going about this?
>
>
>

And changing that example to use PUT (with default HTTP/1.1)  provides this result…

Isn’t this how a chunked transfer works?

PUT / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

Alister Pillow-3
In reply to this post by nsincaglia
Apologies - I now see you don’t want transfer-encoding: chunked

On 15 Mar 2017, at 12:46 pm, Nick Sincaglia <[hidden email]> wrote:

. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

Claudius Teodorescu
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

wshager
Why not do it in xquery? Since a document is base64 encoded you know the size of a document. Just convert your xs:base64Binary to a string using fn:string() and start counting.

2017-03-15 9:19 GMT+01:00 Claudius Teodorescu <[hidden email]>:
Hi,


Maybe http://expath.org/spec/binary#part?



--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671698.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open



--

W.S. Hager
Lagua Web Solutions
http://lagua.nl


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

Jonathan Rowell
In reply to this post by nsincaglia

Hi Nick,


the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!

If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.

The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.

The one that I have used when I have had problems with attachments is hj-split (available under windows).


Jonathan




From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
expath.org
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment?

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

nsincaglia
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

wshager
Nick,

Jonathan is right, and I was being ironic. You need streaming, or you'll blow up the heap. There's no streaming reader in eXist AFAIK so no xquery solution there ATM.

2017-03-15 17:33 GMT+01:00 Nick Sincaglia <[hidden email]>:
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

nsincaglia
Thanks for the answer. Let me ask a follow up question.

We currently perform a number of operations on binary files in our XQuery programs within eXist-db such as HTTP GET, PUT, POST, calculate MD5 hash sums, FTP/SFTP upload and downloads. We use util:binary-doc() to retrieve these binary files from the database just before we perform these operations. We don’t stream these files when performing these operations currently 

If we check to see if the binary file is smaller than our heap size prior to performing this operation, shouldn’t be protected from heap memory crashes? Are you saying that streaming is necessary in all cases or just in the case of binary files that exceed the java heap size? 

Thanks for the help!

Nick

On Mar 15, 2017, at 3:36 PM, W.S. Hager <[hidden email]> wrote:

Nick,

Jonathan is right, and I was being ironic. You need streaming, or you'll blow up the heap. There's no streaming reader in eXist AFAIK so no xquery solution there ATM.

2017-03-15 17:33 GMT+01:00 Nick Sincaglia <[hidden email]>:
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank" class="">+1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank" class="">+1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

wshager
That depends, but I'd say keep your memory footprint low. I don't see why you should invest all resources into these operations. It's not what XQuery was intended to do, but they said the same about PHP and Javascript once... Just be aware that picking apart strings in XQuery will result in out-of-the-box persistent (immutable) data structures, and you also need to know their performance characteristics. There's quite a bit of (un)marshalling going on in the background, and you want your buffer to stay as clean as possible, and that's why you're better off on a lower level. As I'm no Java expert I can't help you with implementation details in eXist, so please present your use case to someone who is.

2017-03-15 22:34 GMT+01:00 Nick Sincaglia <[hidden email]>:
Thanks for the answer. Let me ask a follow up question.

We currently perform a number of operations on binary files in our XQuery programs within eXist-db such as HTTP GET, PUT, POST, calculate MD5 hash sums, FTP/SFTP upload and downloads. We use util:binary-doc() to retrieve these binary files from the database just before we perform these operations. We don’t stream these files when performing these operations currently 

If we check to see if the binary file is smaller than our heap size prior to performing this operation, shouldn’t be protected from heap memory crashes? Are you saying that streaming is necessary in all cases or just in the case of binary files that exceed the java heap size? 

Thanks for the help!

Nick

On Mar 15, 2017, at 3:36 PM, W.S. Hager <[hidden email]> wrote:

Nick,

Jonathan is right, and I was being ironic. You need streaming, or you'll blow up the heap. There's no streaming reader in eXist AFAIK so no xquery solution there ATM.

2017-03-15 17:33 GMT+01:00 Nick Sincaglia <[hidden email]>:
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia







--

W.S. Hager
Lagua Web Solutions
http://lagua.nl


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

wshager
That said I wouldn't mind seeing some more low level functionality in XQuery. fn:byte-length anyone?

2017-03-15 23:05 GMT+01:00 W.S. Hager <[hidden email]>:
That depends, but I'd say keep your memory footprint low. I don't see why you should invest all resources into these operations. It's not what XQuery was intended to do, but they said the same about PHP and Javascript once... Just be aware that picking apart strings in XQuery will result in out-of-the-box persistent (immutable) data structures, and you also need to know their performance characteristics. There's quite a bit of (un)marshalling going on in the background, and you want your buffer to stay as clean as possible, and that's why you're better off on a lower level. As I'm no Java expert I can't help you with implementation details in eXist, so please present your use case to someone who is.

2017-03-15 22:34 GMT+01:00 Nick Sincaglia <[hidden email]>:
Thanks for the answer. Let me ask a follow up question.

We currently perform a number of operations on binary files in our XQuery programs within eXist-db such as HTTP GET, PUT, POST, calculate MD5 hash sums, FTP/SFTP upload and downloads. We use util:binary-doc() to retrieve these binary files from the database just before we perform these operations. We don’t stream these files when performing these operations currently 

If we check to see if the binary file is smaller than our heap size prior to performing this operation, shouldn’t be protected from heap memory crashes? Are you saying that streaming is necessary in all cases or just in the case of binary files that exceed the java heap size? 

Thanks for the help!

Nick

On Mar 15, 2017, at 3:36 PM, W.S. Hager <[hidden email]> wrote:

Nick,

Jonathan is right, and I was being ironic. You need streaming, or you'll blow up the heap. There's no streaming reader in eXist AFAIK so no xquery solution there ATM.

2017-03-15 17:33 GMT+01:00 Nick Sincaglia <[hidden email]>:
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank">+1-630-303-7035
Skype: nsincaglia







--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

nsincaglia
In reply to this post by wshager
Ok. Thanks again for the response. 

Yeah, we are using eXist-db and Xquery because we receive very complex XML documents which describe media files (audio, video, images and pdfs). We have to do basic operations like move the files from point A to point B and calculate hash sums which we must embed into the XML documents to pass along to other parties. It would be impractical for us to process the XMLs separately from the media files. We have developed a series of complex workflows using a linear step process that Dan McCreary initially created for us. We can do this for media files that are more than a 1 GB in size. We configure our system and server to accommodate these larger files but calculate the size of the files before every operation to make sure we are not exceeding our heap size. 

eXist-db is actually more capable of performing binary operations than I think many people may realize. It is gotten steadily better and better over time thanks to the efforts of Dan, Claudius, Adam, Wolfgang and others. We are definitely in the minority in trying to use eXist-db in this way but the XML that describes these media files is so complex, it would be hard for me to imagine trying to processes them using another software language. And the media files need to travel with the XML files or there will be nothing to describe what the media files are.

Thanks again!

Nick

On Mar 15, 2017, at 5:05 PM, W.S. Hager <[hidden email]> wrote:

That depends, but I'd say keep your memory footprint low. I don't see why you should invest all resources into these operations. It's not what XQuery was intended to do, but they said the same about PHP and Javascript once... Just be aware that picking apart strings in XQuery will result in out-of-the-box persistent (immutable) data structures, and you also need to know their performance characteristics. There's quite a bit of (un)marshalling going on in the background, and you want your buffer to stay as clean as possible, and that's why you're better off on a lower level. As I'm no Java expert I can't help you with implementation details in eXist, so please present your use case to someone who is.

2017-03-15 22:34 GMT+01:00 Nick Sincaglia <[hidden email]>:
Thanks for the answer. Let me ask a follow up question.

We currently perform a number of operations on binary files in our XQuery programs within eXist-db such as HTTP GET, PUT, POST, calculate MD5 hash sums, FTP/SFTP upload and downloads. We use util:binary-doc() to retrieve these binary files from the database just before we perform these operations. We don’t stream these files when performing these operations currently 

If we check to see if the binary file is smaller than our heap size prior to performing this operation, shouldn’t be protected from heap memory crashes? Are you saying that streaming is necessary in all cases or just in the case of binary files that exceed the java heap size? 

Thanks for the help!

Nick

On Mar 15, 2017, at 3:36 PM, W.S. Hager <[hidden email]> wrote:

Nick,

Jonathan is right, and I was being ironic. You need streaming, or you'll blow up the heap. There's no streaming reader in eXist AFAIK so no xquery solution there ATM.

2017-03-15 17:33 GMT+01:00 Nick Sincaglia <[hidden email]>:
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank" class="">+1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank" class="">+1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:+1%20630-303-7035" value="+16303037035" target="_blank" class="">+1-630-303-7035
Skype: nsincaglia







--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Breaking binary files into chunks

Jonathan Rowell

Hi Nick,


recently I was told by Adam Retter that the functions around binary-doc() were actually stream based. The occasion was due to the fact that in 3.0 RC 1 binary-doc() actually left the file handle open at the end of the operation. I used binary-doc() read an .xql/xqm file and compile it (to check it) and the result was I couldn't clean out a package due to open files.


That said I thought why should binary-doc() be stream based? You have pointed out a very good use case. There is more and more need for a metadata representation of electronic media of various types, best represented by XML. And for that eXist is indeed a very good database. But this metadata is, unlike conventional media books and so, accompanied  by mega bytes of video/audio data, and it's processing - storage and such - does need at least to be managed by eXist, ie: XQuery, although the actual storage might be elsewhere. There might then be the case for a stream like interface.


However there is a simpler alternative, implemented in Node.js whereby a file, like an HTTP call, gets read in chunks. The size of the chunk being system dependant, the chunk-read calls a call-back routine with a chunk which is to be processed. The simplest case - when one has enough memory - is to concatenate them to construct the original file. In Node.js this is often the case because the operation is asynchronous. When not one processes the chunks as necessary. 


Now I'm not a Java programmer, but I don't think that writing an eXist plugin which chunk-reads a file and calls a callback should be that difficult. The only problem I see is how does one persist the chunked data, ie: peform a concat(). But if one needed to HTTP the data away somewhere it would be quite easy. It would avoid all the concomitant problems with a stream interface.


Just a thought.


Jonathan

 




From: Nick Sincaglia <[hidden email]>
Sent: Thursday, March 16, 2017 12:12 AM
To: W.S. Hager
Cc: Jonathan Rowell; exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Ok. Thanks again for the response. 

Yeah, we are using eXist-db and Xquery because we receive very complex XML documents which describe media files (audio, video, images and pdfs). We have to do basic operations like move the files from point A to point B and calculate hash sums which we must embed into the XML documents to pass along to other parties. It would be impractical for us to process the XMLs separately from the media files. We have developed a series of complex workflows using a linear step process that Dan McCreary initially created for us. We can do this for media files that are more than a 1 GB in size. We configure our system and server to accommodate these larger files but calculate the size of the files before every operation to make sure we are not exceeding our heap size. 

eXist-db is actually more capable of performing binary operations than I think many people may realize. It is gotten steadily better and better over time thanks to the efforts of Dan, Claudius, Adam, Wolfgang and others. We are definitely in the minority in trying to use eXist-db in this way but the XML that describes these media files is so complex, it would be hard for me to imagine trying to processes them using another software language. And the media files need to travel with the XML files or there will be nothing to describe what the media files are.

Thanks again!

Nick

On Mar 15, 2017, at 5:05 PM, W.S. Hager <[hidden email]> wrote:

That depends, but I'd say keep your memory footprint low. I don't see why you should invest all resources into these operations. It's not what XQuery was intended to do, but they said the same about PHP and Javascript once... Just be aware that picking apart strings in XQuery will result in out-of-the-box persistent (immutable) data structures, and you also need to know their performance characteristics. There's quite a bit of (un)marshalling going on in the background, and you want your buffer to stay as clean as possible, and that's why you're better off on a lower level. As I'm no Java expert I can't help you with implementation details in eXist, so please present your use case to someone who is.

2017-03-15 22:34 GMT+01:00 Nick Sincaglia <[hidden email]>:
Thanks for the answer. Let me ask a follow up question.

We currently perform a number of operations on binary files in our XQuery programs within eXist-db such as HTTP GET, PUT, POST, calculate MD5 hash sums, FTP/SFTP upload and downloads. We use util:binary-doc() to retrieve these binary files from the database just before we perform these operations. We don’t stream these files when performing these operations currently 

If we check to see if the binary file is smaller than our heap size prior to performing this operation, shouldn’t be protected from heap memory crashes? Are you saying that streaming is necessary in all cases or just in the case of binary files that exceed the java heap size? 

Thanks for the help!

Nick

On Mar 15, 2017, at 3:36 PM, W.S. Hager <[hidden email]> wrote:

Nick,

Jonathan is right, and I was being ironic. You need streaming, or you'll blow up the heap. There's no streaming reader in eXist AFAIK so no xquery solution there ATM.

2017-03-15 17:33 GMT+01:00 Nick Sincaglia <[hidden email]>:
Jonathan,
OK. I don’t think splitting it up outside eXist-db will work for me. I need to incorporate this into a web service choreography that processes hundreds of files in a loop. I was going to try to manipulate the base64 String data I get when I retrieve a binary file using util:binary-doc() and break that into chunks and sending each chunk in a separate web service call like the API requires. I am not sure if that will work or not. I know I have to worry about base64 binary padding. I was going to give it a try to see if it is a possibility or not.

Nick

On Mar 15, 2017, at 8:16 AM, Jonathan Rowell <[hidden email]> wrote:

Hi Nick,

the first question to ask is whether the file size (being greater than 64M) is such that you can read it into an XQuery string!
If you had a streaming interface you could read blocks of a certain size and then send them, but I'm not aware of a stream mechanism in eXist.
The only way I an see is to use some tool to split the file into chunks outside of eXist and then send then.
The one that I have used when I have had problems with attachments is hj-split (available under windows).

Jonathan



From: Nick Sincaglia <[hidden email]>
Sent: Wednesday, March 15, 2017 2:16 AM
To: Alister Pillow
Cc: exist-open
Subject: Re: [Exist-open] Breaking binary files into chunks
 
Hi Alister,
I am not sure what you have posted below helps me. The REST API we are trying to communicate with only allows PUT calls that contain 64 MBs of data or less. If a file is larger than 64 MBs, we need to send multiple PUT calls with different chunks of the binary file and then send another web service call to tell the API to combine the data received from the previous web service calls.
Therefore,  I am trying to understand how to read a binary file in chunks so that I can send those chunks over http to communicate with a REST API. The Expath Flle Module appears to be able to do this (http://expath.org/spec/file) using the function:
Abstract. This proposal provides a file system API for XPath. It defines extension functions to perform file system related operations such as listing, reading ...

file:read-binary($file as xs:string,
                 $offset as xs:integer,
                 $length as xs:integer) as xs:base64Binary
However, I don’t think this module is available as an install into eXist-db. Is there any other way one can do this?

Nick

On Mar 14, 2017, at 5:42 PM, Alister Pillow <[hidden email]> wrote:

Hi Nick,
I thought you were the expert! :)

I found that posting to “netcat” running on localhost (Linux, MacOS) was the best way to debug http:send-request:

nc -l 8123 >> tcp-data.txt

I also found that if you post multipart with the http-version=1.0
<http:request method="post" http-version="1.0">

… the post is sent WITH the correct Content-Length,
and WITHOUT the http-version, the content is sent chunked




————— a cut-down version of how I use send-request with AWS

let $full-path := '/db/test/test.xqm'
let $file-data := util:binary-doc($full-path)
let $mime-type := xmldb:get-mime-type(xs:anyURI($full-path))

let $binary := 'test.xqm'
    
   
let $aws-signed-form := <form><input type='filename' value='/db/test/db.xml' /></form>
let $ss := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
            xmlns:h="http://www.w3.org/1999/xhtml"
            xmlns:http="http://expath.org/ns/http-client"
            exclude-result-prefixes="xs xd" version="2.0">
            
            <xsl:template match="/form">
                <http:request method="post" http-version="1.0">
                    <http:header name="Connection" value="close" />
                    <http:multipart media-type="multipart/form-data" boundary='xyzBouNDarYxyz'>
                        <xsl:apply-templates />                                
                        <http:header name="Content-Disposition" value="form-data; name=file; filename={$binary}"/>
                        <http:header name="Content-Type" value="{$mime-type}" />
                        <http:body media-type="{$mime-type}" method="binary">{$file-data}</http:body>
                    </http:multipart>
                </http:request>
            </xsl:template>
            
            <xsl:template match="input">
                <http:header name="Content-Disposition" value="form-data; name={{./@name}}" />
                <http:body media-type="text/plain"><xsl:value-of select="./@value"/></http:body>
            </xsl:template>
        </xsl:stylesheet>
    
        let $content := transform:transform($aws-signed-form, $ss,())
        
        let $resp := http:send-request($content, 'http://localhost:8123')
        return $resp


With the http-version set, it produced 

POST / HTTP/1.0
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Content-Length: 371
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
--xyzBouNDarYxyz
Content-Disposition: form-data; name=file; filename=test.xqm
Content-Type: application/xquery

xquery version "3.0";

module namespace p1 = 'http://pekoe.io/test';

declare function p1:one($a,$b) {
    'good'
};
--xyzBouNDarYxyz—


And with NO http-version it defaults to HTTP/1.1

POST / HTTP/1.1
Connection: close
Content-Type: multipart/form-data; boundary="xyzBouNDarYxyz"
Transfer-Encoding: chunked
Host: localhost:8123
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)

73
--xyzBouNDarYxyz
Content-Disposition: form-data; name=
Content-Type: text/plain; charset=UTF-8

/db/test/db.xml
0





On 15 Mar 2017, at 7:33 am, nsincaglia <[hidden email]> wrote:

To further comment on this post, I know that the Expath File Module contains
functions that perform the file splitting operations I am looking for. Am I
correct is saying that this module is not currently configured to be
installed as a module into eXist-db at the moment? 

If I am correct in that statement, I am curious, how do people perform
multipart posts with binary files? Can anyone share how they do it or
recommend a preferred way of going about this?

Nick




--
View this message in context: http://exist.2174344.n4.nabble.com/Breaking-binary-files-into-chunks-tp4671687p4671692.html
Sent from the exist-open mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:&#43;1%20630-303-7035" value="&#43;16303037035" target="_blank" class=""> +1-630-303-7035
Skype: nsincaglia



Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:&#43;1%20630-303-7035" value="&#43;16303037035" target="_blank" class=""> +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open




--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: <a href="tel:&#43;1%20630-303-7035" value="&#43;16303037035" target="_blank" class=""> +1-630-303-7035
Skype: nsincaglia







--

W.S. Hager
Lagua Web Solutions
http://lagua.nl




Nick Sincaglia
President/Founder
NueMeta LLC
Digital Media & Technology
Phone: +1-630-303-7035
Skype: nsincaglia





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/exist-open
Loading...