Zip Streaming
The Scenario
I have a web server
that supports debugging of my application. As part of the debugging I would
like the option to download all log files that I have on my disk as a zip file.
Strait forward Solution
At first this looks
like a very simple solution. I need to go over all the folders where I might
have log files and add them to a zip file and then download the file.
Zip files
Java/Scala does not
have a very convenient API for adding files to a zip file, but after searching
the web it can be found (the issue is keeping the folder structure relative to
a parent folder):
def createZip(folderParents: Map[File, List[File]]): ByteArrayOutputStream = { val bos = new ByteArrayOutputStream() val zipFile = new ZipOutputStream(bos) try { folderParents.foreach({ case (parentRoot, files) => createZip(zipFile, parentRoot, files) }) } finally { zipFile.finish() zipFile.close() bos.close() } bos } def createZip(zipFile: ZipOutputStream, parentRoot: File, files: List[File]): Unit = { files.foreach(fileName => { val relativePath = parentRoot.toURI().relativize(fileName.toURI()).getPath() val zipEntry = new ZipEntry(relativePath) try { zipFile.putNextEntry(zipEntry) val buff = scala.io.Source.fromFile(fileName) val data = buff.foreach(zipFile.write(_)) buff.close() } finally { zipFile.closeEntry() } } ) }
To then download the
file from a spring boot server:
@RequestMapping(value = Array("/troubleshoot/logs"), method = Array(GET))@ResponseBodydef getLogFiles: ResponseEntity[Array[Byte]] = { val res = ZipUtils.createZip(folderParents)
val headers = new HttpHeaders() headers.setContentType(MediaType.APPLICATION_OCTET_STREAM) headers.add("Content-Disposition","attachment; filename=\"log.zip\"") return ResponseEntity .ok() .headers(headers) .contentLength(res.size()) .contentType(MediaType.parseMediaType("application/octet-stream")) .body(res.toByteArray); }
Problem
The issues begin once
we have very large log files. Once the log files are big (50mb and above) we
start to see two issues. The first issue is that once the REST request is sent
it can take a while until the file starts to download. The second issue is that
we start to get out of memory exceptions.
The problem lies in the
fact that to create the zip file I loaded all the files to memory and added
them to the ZipOutputStream which is an in memory zip file. So what happens is
that I need to load all the log files (maybe 500mb) and create the zip. Once
the zip is created the download will begin. This also explains why I have an
out of memory exception.
Solution
The ultimate solution
to this problem is to stream the files to the client as we load them. This way
we will solve both issues. We should not be loading all the files into memory.
To do this we need on
one side to read the log files using a buffer and not load the full file (org.apache.commons.compress.utils.IOUtils
is a good utility), and on the other side to stream the loaded data in a zip
file - to do this we need to use GZIPOutputStream.
GZIPOutputStream allows
streaming data to a zip file. The flow is: we load the log files, read into a
buffer data, send the data to the zip stream, and the stream starts to download
to the user. Then we continue this process, and we a have an end to end
streaming will memory consumption of the buffer and no more.
The problem with GZIPOutputStream
is that it does not support a folder structure. To solve this issue we will use
the apache class TarArchiveOutputStream. The Tar Archive supports both folder structure
and streaming. So we wrap the TarArchiveOutputStream with the GZIPOutputStream
and then we get a folder structure that is both zipped and streamed down to the
client.
def createTar(stream : OutputStream, folderParents: Map[File, List[File]]): Unit = { val tar = new TarArchiveOutputStream(stream) folderParents.foreach({ case (parentRoot, files) => createTar(tar, parentRoot, files) }) tar.close() } def createTar(tOut: TarArchiveOutputStream, parentRoot: File, files: List[File]): Unit = { files.foreach(file => { val relativePath = parentRoot.toURI().relativize(file.toURI()).getPath() val tarEntry = new TarArchiveEntry(file) try { tarEntry.setSize(file.length()) tOut.putArchiveEntry(tarEntry) IOUtils.copy(new FileInputStream(file), tOut); } finally { tOut.closeArchiveEntry(); } }) }
val gzos = new GZIPOutputStream(outputStream)val buff = createTar(gzos,folderParents)
outputStream.flush()
outputStream.close()