AWS Lambda to split a multipage PDF into separate pages
Hello kids! Today we want to introduce you to a process we needed to create for a specific task. Given that the AWS platform is a common BlueGrid.io playground this is where we'll do the work now. The task we needed solving recently was trivial yet interesting. We needed AWS Lambda to split a multipage PDF file into separate single-page files.
We started completing this task locally with python script we have created. The script, below, was tested on the AWS Lambda to split a multipage PDF file:
From PyPDF import PdfFileWriiter, PdfFileReader for I in range(input-do.numPages): output = PdfFileWriter() output.addPage(inputpdf.getPage(1)) with open(“document-page%s.pdf” % I, “”wb) as outputStream: output.write(outputStream)
Since the local test passed successfully we needed to give it a test run on Lambda. However, the error has appeared:
[ERROR] OSError: [Errno 30] Read-only file system
The working directory is /var/task and by definition in the code above (relative path to "
open() function will try to create files there. This, however, will cause the error above and we need to update the open() function to create files in
From PyPDF import PdfFileWriiter, PdfFileReader for I in range(input-do.numPages): output = PdfFileWriter() output.addPage(inputpdf.getPage(1)) with open(‘/tmp/document-page.pdf’ % I, ”wb”) as outputStream: output.write(outputStream)
/tmp location is available during the execution of a Lambda function. Lambda will reuse the function when possible, and when it does, the content of
/tmp will be preserved along with any process. However, Lambda doesn’t guarantee that a function invocation will be reused, so the contents of
/tmp could disappear at any time.
Another important note is that AWS Lambda limits the amount of computing and storage resources.
/tmp directory storage has a limit of 512MB.