Langchain Transformer

Overview

A versatile document transformation service that leverages Langchain's extensive document loader ecosystem to transform various file formats into structured text content. Supports PDF, HTML, CSV, and general text processing through multiple specialized loaders.

Service Information

Property	Value
Service Name	Langchain transformer
Status	Enabled
Compatible Nodes	Transform Content

Key Features

Multiple PDF loaders: PyMuPDF, PDFMiner, PyPDF, PDFPlumber for different PDF structures
Structured data support: CSV and JSON processing with configurable parsing
Web content: HTML parsing with BeautifulSoup
Unstructured data: General text extraction via Unstructured API
Batch processing: Transform multiple documents in a single operation
Metadata preservation: Maintains document metadata through transformation

Inputs

Input	Type	Required	Description
Content	Content selection / Import Set	Yes	Documents to transform (PDF, HTML, CSV, text files)

Parameters

Transform Source Selection

Parameter	Type	Required	Description
Transform source	Choice	Yes	File type: "PDF", "HTML", "CSV", "Text"

Service Selection by Source Type

PDF Sources

Service	Description	Best For
PyMuPDF	Fast, accurate PDF text extraction	Most PDFs, especially with complex layouts
PDFMiner	Detailed text positioning and layout analysis	PDFs requiring precise text location
PyPDF	Simple, lightweight PDF reader	Basic text extraction from simple PDFs
PDFPlumber	Table and structured data extraction	PDFs with tables and forms