Hello kids! Today we want to introduce you to a process we needed to create for a specific task. Given that the AWS platform is a common BlueGrid.io playground this is where we’ll do the work now. The task we needed solving recently was trivial yet interesting. We needed AWS Lambda to split a multipage PDF file into separate single-page files.
We started completing this task locally with python script we have created. The script, below, was tested on the AWS Lambda to split a multipage PDF file:
From PyPDF import PdfFileWriiter, PdfFileReader
for I in range(input-do.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(1))
with open(“document-page%s.pdf” % I, “”wb) as outputStream:
output.write(outputStream)Since the local test passed successfully we needed to give it a test run on Lambda. However, the error has appeared:
[ERROR] OSError: [Errno 30] Read-only file systemThe working directory is /var/task and by definition in the code above (relative path to “document-page%s.pdf“), the open() function will try to create files there. This, however, will cause the error above and we need to update the open() function to create files in /tmp directory:
From PyPDF import PdfFileWriiter, PdfFileReader
for I in range(input-do.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(1))
with open(‘/tmp/document-page.pdf’ % I, ”wb”) as outputStream:
output.write(outputStream)The /tmp location is available during the execution of a Lambda function. Lambda will reuse the function when possible, and when it does, the content of /tmp will be preserved along with any process. However, Lambda doesn’t guarantee that a function invocation will be reused, so the contents of /tmp could disappear at any time.
Another important note is that AWS Lambda limits the amount of computing and storage resources. /tmp directory storage has a limit of 512MB.
Related article: AWS Lambda Destinations – bugs review