Products Technologies Demo Docs Blog Support Company

ReportingCloud: Extract Plain Text from Adobe PDF, MS Word or HTML Documents

The ReportingCloud document/convert endpoint now supports plain text as a return format, enabling text extraction from Adobe PDF, MS Word DOCX, and HTML documents. The .NET SDK provides a method to convert uploaded documents and retrieve their complete plain text content.

ReportingCloud: Extract Plain Text from Adobe PDF, MS Word or HTML Documents

The endpoint document/convert can take a document such as an Adobe PDF, MS Word DOCX or HTML document in order to convert it into another format.

https://api.reporting.cloud/v1/document/convert

In addition, all endpoints that return documents can now return plain text (TXT) as well. This allows you to extract plain text from Adobe PDF documents or MS Word Office Open XML (DOCX) documents.

The request parameters of the document/convert method contain the returnFormat parameter that accepts the following formats: PDF, PDFA, RTF, DOC, DOCX, HTML, TXT and TX.

In order to extract plain text from an Adobe PDF document, you simply need to send a digitally born PDF document and to request the return document format as TXT (plain text).

The following code uses the ReportingCloud .NET Framework SDK (C#) to extract plain text from an Adobe PDF document:

TXTextControl.ReportingCloud.ReportingCloud rc = 
    new TXTextControl.ReportingCloud.ReportingCloud("yourAPIKey");

byte[] bResults = rc.ConvertDocument(
    File.ReadAllBytes("document.pdf"), 
    TXTextControl.ReportingCloud.ReturnFormat.TXT);

Console.WriteLine(Encoding.ASCII.GetString(bResults));

Test this is on your own and create a free ReportingCloud trial account or read the full documentation.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Reporting

The Text Control Reporting Framework combines powerful reporting features with an easy-to-use, MS Word compatible word processor. Users can create documents and templates using ordinary Microsoft Word skills. The Reporting Framework is included in all .NET based TX Text Control products including ASP.NET, Windows Forms and WPF.

See Reporting products

Related Posts

ReportingReportingCloudTrack Changes

New Endpoint: Manipulating Tracked Changes in Documents

ReportingCloud adds API endpoints for manipulating tracked changes in documents. The trackedchanges endpoint retrieves metadata including change type, timestamp, and highlight color. A companion…


ReportingConferenceReportingCloud

Impressions from Developer Days Magdeburg 2019

Text Control sponsored the sold-out Developer Days Magdeburg 2019, presenting the latest product versions and technologies at its booth. The team raffled off an Xbox One during the closing session…


ReportingConferenceReportingCloud

See Text Control at DEVintersection, Orlando

Text Control exhibits at DEVintersection Spring 2019 in Orlando from June 11 to 13, presenting the upcoming TX Text Control X17 with high DPI support and Visual Studio 2019 compatibility. The…


ReportingConferenceReportingCloud

Impressions from dotnet Cologne 2019

Text Control sponsored the community-driven dotnet Cologne 2019, presenting products across all platforms with branded frisbees and stickers. The sold-out event attracted enthusiastic developers…


ReportingReportingCloud

ReportingCloud: New MergeSettings property to remove empty lines

The ReportingCloud Web API adds a removeEmptyLines property to MergeSettings that removes blank lines from merged documents when all merge fields on a line are empty. This prevents unwanted gaps…

Share on this blog post on: