Tech

AWS Lambda to split a multipage PDF into separate pages


Hello kids! Today we want to introduce you to a process we needed to create for a specific task. Given that the AWS platform is a common BlueGrid.io playground this is where we’ll do the work now. The task we needed solving recently was trivial yet interesting. We needed AWS Lambda to split a multipage PDF file into separate single-page files.

We started completing this task locally with python script we have created. The script, below, was tested on the AWS Lambda to split a multipage PDF file:

From PyPDF import PdfFileWriiter, PdfFileReader
	for I in range(input-do.numPages):
		output = PdfFileWriter()
		output.addPage(inputpdf.getPage(1))
		with open(“document-page%s.pdf” % I, “”wb) as outputStream:
			output.write(outputStream)

Since the local test passed successfully we needed to give it a test run on Lambda. However, the error has appeared:

[ERROR] OSError: [Errno 30] Read-only file system

The working directory is /var/task and by definition in the code above (relative path to “document-page%s.pdf“), the open() function will try to create files there. This, however, will cause the error above and we need to update the open() function to create files in /tmp directory:

From PyPDF import PdfFileWriiter, PdfFileReader
	for I in range(input-do.numPages):
		output = PdfFileWriter()
		output.addPage(inputpdf.getPage(1))
    with open(‘/tmp/document-page.pdf’ % I, ”wb”) as outputStream:
			output.write(outputStream)

The /tmp location is available during the execution of a Lambda function. Lambda will reuse the function when possible, and when it does, the content of /tmp will be preserved along with any process. However, Lambda doesn’t guarantee that a function invocation will be reused, so the contents of /tmp could disappear at any time.

Another important note is that AWS Lambda limits the amount of computing and storage resources. /tmp directory storage has a limit of 512MB.   

Mile Stojaković

A man with a short beard and glasses, wearing a light blue button-up shirt and a black watch, stands with his arms crossed, looking thoughtfully to the side against a plain white background.

Mile Stojaković

Navigating the intersections of cutting-edge technology domains at BlueGrid.io. In charge of operations in the BlueGrid.io organisation with focus on cybersecurity compliance.

Running marathons and trail races. Enjoying good coffee and trusting no one to make one but myself!

Share this post

Share this link via

Or copy link