Saturday, January 16, 2016

Zip Streaming Log Files

Zip Streaming


The Scenario

I have a web server that supports debugging of my application. As part of the debugging I would like the option to download all log files that I have on my disk as a zip file.

Strait forward Solution

At first this looks like a very simple solution. I need to go over all the folders where I might have log files and add them to a zip file and then download the file.

Zip files

Java/Scala does not have a very convenient API for adding files to a zip file, but after searching the web it can be found (the issue is keeping the folder structure relative to a parent folder):
def createZip(folderParents: Map[File, List[File]]): ByteArrayOutputStream = {
  val bos = new ByteArrayOutputStream()
  val zipFile = new ZipOutputStream(bos)
  try {
    folderParents.foreach({ case (parentRoot, files) => createZip(zipFile, parentRoot, files) })
  }
  finally {
    zipFile.finish()
    zipFile.close()
    bos.close()
  }
  bos
}



def createZip(zipFile: ZipOutputStream, parentRoot: File, files: List[File]): Unit = {
  files.foreach(fileName => {
    val relativePath = parentRoot.toURI().relativize(fileName.toURI()).getPath()
    val zipEntry = new ZipEntry(relativePath)
    try {
      zipFile.putNextEntry(zipEntry)
      val buff = scala.io.Source.fromFile(fileName)
      val data = buff.foreach(zipFile.write(_))
      buff.close()
    }
    finally { zipFile.closeEntry() }
  }
  )
}

To then download the file from a spring boot server:
@RequestMapping(value = Array("/troubleshoot/logs"), method = Array(GET))@ResponseBodydef getLogFiles: ResponseEntity[Array[Byte]] = {
  val res = ZipUtils.createZip(folderParents)
  val headers = new HttpHeaders()
  headers.setContentType(MediaType.APPLICATION_OCTET_STREAM)
  headers.add("Content-Disposition","attachment; filename=\"log.zip\"")

  return ResponseEntity
    .ok()
    .headers(headers)
    .contentLength(res.size())
    .contentType(MediaType.parseMediaType("application/octet-stream"))
    .body(res.toByteArray);
}

Problem

The issues begin once we have very large log files. Once the log files are big (50mb and above) we start to see two issues. The first issue is that once the REST request is sent it can take a while until the file starts to download. The second issue is that we start to get out of memory exceptions.
The problem lies in the fact that to create the zip file I loaded all the files to memory and added them to the ZipOutputStream which is an in memory zip file. So what happens is that I need to load all the log files (maybe 500mb) and create the zip. Once the zip is created the download will begin. This also explains why I have an out of memory exception.


Solution

The ultimate solution to this problem is to stream the files to the client as we load them. This way we will solve both issues. We should not be loading all the files into memory.
To do this we need on one side to read the log files using a buffer and not load the full file (org.apache.commons.compress.utils.IOUtils is a good utility), and on the other side to stream the loaded data in a zip file - to do this we need to use GZIPOutputStream.
GZIPOutputStream allows streaming data to a zip file. The flow is: we load the log files, read into a buffer data, send the data to the zip stream, and the stream starts to download to the user. Then we continue this process, and we a have an end to end streaming will memory consumption of the buffer and no more.
The problem with GZIPOutputStream is that it does not support a folder structure. To solve this issue we will use the apache class TarArchiveOutputStream. The Tar Archive supports both folder structure and streaming. So we wrap the TarArchiveOutputStream with the GZIPOutputStream and then we get a folder structure that is both zipped and streamed down to the client.


def createTar(stream : OutputStream, folderParents: Map[File, List[File]]): Unit = {
  val tar = new TarArchiveOutputStream(stream)
  folderParents.foreach({ case (parentRoot, files) => createTar(tar, parentRoot, files) })
  tar.close()
}



def createTar(tOut: TarArchiveOutputStream, parentRoot: File, files: List[File]): Unit = {
  files.foreach(file => {
    val relativePath = parentRoot.toURI().relativize(file.toURI()).getPath()
    val tarEntry = new TarArchiveEntry(file)
    try {
      tarEntry.setSize(file.length())
      tOut.putArchiveEntry(tarEntry)
      IOUtils.copy(new FileInputStream(file), tOut);
    }
    finally {  tOut.closeArchiveEntry();   }
  })
}

val gzos = new GZIPOutputStream(outputStream)val buff = createTar(gzos,folderParents)
outputStream.flush()
outputStream.close()