The endpoint document/convert can take a document such as an Adobe PDF, MS Word DOCX or HTML document in order to convert it into another format.

https://api.reporting.cloud/v1/document/convert

In addition, all endpoints that return documents can now return plain text (TXT) as well. This allows you to extract plain text from Adobe PDF documents or MS Word Office Open XML (DOCX) documents.

The request parameters of the document/convert method contain the returnFormat parameter that accepts the following formats: PDF, PDFA, RTF, DOC, DOCX, HTML, TXT and TX.

In order to extract plain text from an Adobe PDF document, you simply need to send a digitally born PDF document and to request the return document format as TXT (plain text).

The following code uses the ReportingCloud .NET Framework SDK (C#) to extract plain text from an Adobe PDF document:

TXTextControl.ReportingCloud.ReportingCloud rc =
new TXTextControl.ReportingCloud.ReportingCloud("yourAPIKey");
byte[] bResults = rc.ConvertDocument(
File.ReadAllBytes("document.pdf"),
TXTextControl.ReportingCloud.ReturnFormat.TXT);
Console.WriteLine(Encoding.ASCII.GetString(bResults));
view raw data.cs hosted with ❤ by GitHub

Test this is on your own and create a free ReportingCloud trial account or read the full documentation.