We are very excited that version 1.4.0 of ComPDFKit PDF SDK is available! In this version, we add a newly powerful capability to protect PDF documents — redaction. Now let's dive into what's new!
What Is Redaction
Redaction is the process of permanently removing information from a PDF file. So that the redacted content is not only unviewable, but also no longer can be found even with text search. And the content we support to redact is including text, images, and vector graphics.
Redacting is different from hiding. There are many ways to hide content, such as putting a black box over the information or creating a black background behind black text. Although hiding can make the information invisible to the naked eye, it does not mean the information is inaccessible. Others can change the text attributes of hidden text or copy and paste it to another processor to make it visible. The reason is that PDF pages contain multiple layers of data, so covering content on a page doesn’t mean the original data can’t be accessed.
There are two steps to redact content:
- First, redaction annotations have to be created in the areas that should be redacted. This step won’t remove any content from the document yet; it just marks regions for redaction.
- Second, to actually remove the content, the redaction annotations need to be applied. In this step, the page content within the region of the redaction annotations is irreversibly removed.
The actual removal of content happens only after redaction annotations have been applied. Before applying, the redaction annotations can be edited and removed the same as any other annotations.
Reasons for Redaction
More and more information is being shared in digital formats like PDF documents since the world is moving toward a paperless society. PDF files are greatly suitable for storage and simple distribution of information. However, in some circumstances, it’s necessary to remove sensitive information from a PDF file before sharing it.
Redaction is needed to protect important information in nearly every industry. Our first thoughts are probably the government institutions and legal industry that deal with sensitive information frequently. In addition, the medical field protects personal health information (PHI) according to requirements under HIPAA. Redaction is also a good choice for students who do not want to leak data when sharing their papers.
Types of Sensitive or Private Information
Here are multiple types of personal and sensitive information that you may want to remove:
- ID card numbers
- Social security numbers
- Home addresses
- Birth dates
- Private phone numbers
- Financial account numbers
- Judiciary records
- Trade secrets
- Classified national secrets
To provide better user experiences, we give flexible ways of redaction for you to choose according to your requirements. Besides manually choosing the area you want to remove, you can choose a whole page or even the entire file to redact as well. Redacting all the occurrences of a particular text from a PDF document is also supported. We hope you enjoy the new feature! As always, all feedback is welcome and we continue to expand our capabilities. So stay tuned!