Eight apparently, or is it three? PDFs are ubiquitous these days, and yet, like the internet, they haven’t been around for long. Here’s a quick round up of what the different types of PDF are, and how they are used, with a focus on the types of PDF that most of us use everyday.
The PDF first appeared in 1993 and for most people it is now the de facto way to share digital documents. For those of us using PDFs, or building products that use them, it’s worth knowing that the humble PDF is not humble at all, there are many PDF types, all to given standards.
This ‘range’ falls roughly into different ways of categorising PDF types themselves: Technical and Everyday. Technically, PDFs have ISO standards and the like, standards for different business sectors and archiving, for engineering and for printing. There are point releases (have you heard of PDF 2.0?) and subsets (surely you know PDF/VT?), none of which, like any good ISO, impinge on our daily life, but are the hidden backbone to it.
Of more interest to most of us are what PDFs there are in everyday parlance, this is much simpler to grasp. Depending on the way the file originated, there are three main types of PDF documents. How the PDF was originally created defines whether the content of the PDF (text, images, tables) can be accessed or whether it is “locked” in an image of the page.
Below I take a quick look at everyday PDFs.
Everyday PDF Types
1. Real PDFs
Real PDFs, also known as digitally created PDFs are ideal for most applications. This is usually the ideal PDF that allows the users to mark up, annotate, search, and copy/paste. Without having to do an extra step. You can easily create them in-app or via the “print” function. You can search these types of PDFs by default, and content such as text and images copied /pasted into other file formats.
Both the meta-information and the characters in the text hold an electronic character designation. With PDF Editors and other document readers you can search through these PDFs. You can also edit, select, or delete any of the content it holds. But not if the document itself has password protection.
2. Scanned PDFs
Scanned PDFs are just an image of the actual text, so the content is “locked” in a snapshot-like image. This is the same as converting a camera image, a screenshot, jpg or tiff into a PDF. These image-only PDF files are not searchable, and their text usually cannot be modified or easily marked up. This is because they are scanned/photographed images of the pages, and thus without an underlying text layer.
You can converted these kinds of image-only PDFs from non-readable text into readable text, through an Optical Character Recognition (OCR) engine. This engine adds an underlying text layer into the image-like PDF. Do note that this is not the same as simply producing text output which will result in a text document, this is probably quite different in layout to the original PDF, see below for more detail.
3. Searchable PDFs
A searchable PDF is a result of applying the Optical Character Recognition (OCR) function into the non-readable PDF or image-like PDF. During the text recognition process, the software analyses and ‘reads’ the characters and document structure. This results in the PDF file having 2 layers: one layer containing the image and the second layer containing the recognised text for searching, annotating and copying / pasting just like it can in a real PDF. Such PDF files are almost indistinguishable from the original documents.
The gold standard is being able to convert PDF to text on the fly, in-application, when you need to. Casedo can do exactly that, take a look at How to Use OCR in Casedo.
Conclusion
In conclusion, while the world of PDFs may seem straightforward at first glance, it’s clear that there is more complexity beneath the surface. From the highly technical ISO-standardized versions to the more commonly used everyday types, PDFs serve a variety of purposes and users.
Understanding the distinctions between Real PDFs, Scanned PDFs, and Searchable PDFs is essential for anyone working with digital documents, as each type has unique characteristics and functionalities. Whether you need to create, edit, or search through a PDF, knowing the type of PDF you’re dealing with will enable you to make the most of this versatile format.
Further Reading
- Types of PDFs. Available at https://pdf.abbyy.com/learning-center/pdf-types/ [Accessed 2024.08.28]
- 8 Types of PDF Standards – Each Serves a Unique Purpose. Available at https://www.marconet.com/blog/8-types-of-pdf-standards-each-serves-a-unique-purpose [Accessed 2024.08.28]
- All about PDF Editors, PDF Editing and PDF Translation. Available at https://www.iceni.com/blog/the-3-types-of-scanned-pdfs/ [Accessed 2024.08.28]
- PDF Evolution: ISO Standards, Subsets, Versions and Types. Available at https://www.investintech.com/resources/blog/archives/7967-pdf-iso-standards-subsets-types.html [Accessed 2024.08.28]
LAST UPDATED 2024.08.29
Eight apparently, or is it three? PDFs are ubiquitous these days, and yet, like the internet, they haven’t been around for long. Here’s a quick round up of what the different types of PDF are, and how they are used, with a focus on the types of PDF that most of us use everyday.
The PDF first appeared in 1993 and for most people it is now the de facto way to share digital documents. For those of us using PDFs, or building products that use them, it’s worth knowing that the humble PDF is not humble at all, there are many PDF types, all to given standards.
This ‘range’ falls roughly into different ways of categorising PDF types themselves: Technical and Everyday. Technically, PDFs have ISO standards and the like, standards for different business sectors and archiving, for engineering and for printing. There are point releases (have you heard of PDF 2.0?) and subsets (surely you know PDF/VT?), none of which, like any good ISO, impinge on our daily life, but are the hidden backbone to it.
Of more interest to most of us are what PDFs there are in everyday parlance, this is much simpler to grasp. Depending on the way the file originated, there are three main types of PDF documents. How the PDF was originally created defines whether the content of the PDF (text, images, tables) can be accessed or whether it is “locked” in an image of the page.
Below I take a quick look at everyday PDFs.
Everyday PDF Types
1. Real PDFs
Real PDFs, also known as digitally created PDFs are ideal for most applications. This is usually the ideal PDF that allows the users to mark up, annotate, search, and copy/paste. Without having to do an extra step. You can easily create them in-app or via the “print” function. You can search these types of PDFs by default, and content such as text and images copied /pasted into other file formats.
Both the meta-information and the characters in the text hold an electronic character designation. With PDF Editors and other document readers you can search through these PDFs. You can also edit, select, or delete any of the content it holds. But not if the document itself has password protection.
2. Scanned PDFs
Scanned PDFs are just an image of the actual text, so the content is “locked” in a snapshot-like image. This is the same as converting a camera image, a screenshot, jpg or tiff into a PDF. These image-only PDF files are not searchable, and their text usually cannot be modified or easily marked up. This is because they are scanned/photographed images of the pages, and thus without an underlying text layer.
You can converted these kinds of image-only PDFs from non-readable text into readable text, through an Optical Character Recognition (OCR) engine. This engine adds an underlying text layer into the image-like PDF. Do note that this is not the same as simply producing text output which will result in a text document, this is probably quite different in layout to the original PDF, see below for more detail.
3. Searchable PDFs
A searchable PDF is a result of applying the Optical Character Recognition (OCR) function into the non-readable PDF or image-like PDF. During the text recognition process, the software analyses and ‘reads’ the characters and document structure. This results in the PDF file having 2 layers: one layer containing the image and the second layer containing the recognised text for searching, annotating and copying / pasting just like it can in a real PDF. Such PDF files are almost indistinguishable from the original documents.
The gold standard is being able to convert PDF to text on the fly, in-application, when you need to. Casedo can do exactly that, take a look at How to Use OCR in Casedo.
Conclusion
In conclusion, while the world of PDFs may seem straightforward at first glance, it’s clear that there is more complexity beneath the surface. From the highly technical ISO-standardized versions to the more commonly used everyday types, PDFs serve a variety of purposes and users.
Understanding the distinctions between Real PDFs, Scanned PDFs, and Searchable PDFs is essential for anyone working with digital documents, as each type has unique characteristics and functionalities. Whether you need to create, edit, or search through a PDF, knowing the type of PDF you’re dealing with will enable you to make the most of this versatile format.
Further Reading
- Types of PDFs. Available at https://pdf.abbyy.com/learning-center/pdf-types/ [Accessed 2024.08.28]
- 8 Types of PDF Standards – Each Serves a Unique Purpose. Available at https://www.marconet.com/blog/8-types-of-pdf-standards-each-serves-a-unique-purpose [Accessed 2024.08.28]
- All about PDF Editors, PDF Editing and PDF Translation. Available at https://www.iceni.com/blog/the-3-types-of-scanned-pdfs/ [Accessed 2024.08.28]
- PDF Evolution: ISO Standards, Subsets, Versions and Types. Available at https://www.investintech.com/resources/blog/archives/7967-pdf-iso-standards-subsets-types.html [Accessed 2024.08.28]
LAST UPDATED 2024.08.29

