oca-ocb-core/odoo-bringout-oca-ocb-base/doc/PATCH_PYPDF2_PDFWRITER.md
Ernad Husremovic 81050e9b17 Enhance PyPDF2 3.x compatibility with comprehensive monkey-patching
- Add force-override monkey-patches for deprecated methods (getObject, getData) in both PyPDF2.generic._base and PyPDF2.generic modules
- Create DecodedStreamObject wrapper for setData/getData compatibility
- Add explicit page copying after cloneReaderDocumentRoot in tests to fix empty PDF issue
- Update documentation with monkey-patching approach, troubleshooting guide, and test results
- Apply patches at module level in both pdf.py and ir_actions_report.py
- All PyPDF2 deprecation errors now resolved for PDF generation and attachment workflows

🤖 assisted by claude

🤖 assisted by claude
2025-11-08 13:49:21 +01:00

10 KiB

PyPDF2 Compatibility Patch

Overview

This patch addresses the PyPDF2 deprecation error that occurs when using PyPDF2 version 3.0.0 or higher with Odoo. The original error was:

PyPDF2.errors.DeprecationError: PdfFileWriter is deprecated and was removed in PyPDF2 3.0.0. Use PdfWriter instead.

Problem

In PyPDF2 3.0.0, several classes and methods were deprecated and removed:

  • PdfFileWriterPdfWriter
  • PdfFileReaderPdfReader
  • addPage()add_page()
  • addMetadata()add_metadata()
  • getNumPages()len(pages)
  • getPage(n)pages[n]
  • appendPagesFromReader()append_pages_from_reader()
  • _addObject()_add_object()
  • cloneReaderDocumentRoot()clone_reader_document_root()
  • setData()set_data() (for DecodedStreamObject)
  • getData()get_data() (for StreamObject and DecodedStreamObject)
  • getObject()get_object() (for IndirectObject)

Solution

This patch provides backward compatibility by using two complementary approaches:

1. Wrapper Classes

Create wrapper classes that:

  • Inherit from the new PyPDF2 classes (PdfWriter, PdfReader)
  • Provide the old method signatures as compatibility methods
  • Gracefully handle both old and new PyPDF2 versions

2. Monkey-Patching (Critical for PyPDF2 3.x)

In PyPDF2 3.0+, deprecated methods still exist but raise DeprecationError. We must:

  • Force override deprecated methods at the base class level (PyPDF2.generic._base)
  • Override methods like getObject(), getData(), setData() to call their new equivalents
  • Apply patches BEFORE any PyPDF2 objects are created
  • Patch both in _base module and generic module for complete coverage

Critical Note: Simply adding methods doesn't work in PyPDF2 3.x because the old methods exist and throw errors. We must replace them.

Files Modified

1. odoo/tools/pdf.py

  • Added compatibility wrapper classes PdfFileWriter and PdfFileReader
  • Added compatibility wrapper class DecodedStreamObject for setData() and getData() methods
  • Added force-override monkey-patches for:
    • IndirectObject.getObject() → calls get_object()
    • StreamObject.getData() → calls get_data()
    • Applied at both PyPDF2.generic._base and PyPDF2.generic levels
  • Updated import logic to handle both PyPDF2 2.x and 3.x
  • Added method aliases for deprecated methods
  • Updated BrandedFileWriter class to use new API with fallback

2. odoo/addons/base/models/ir_actions_report.py

  • Added compatibility import logic
  • Created local compatibility classes with required method aliases
  • Added support for numPages property and related methods
  • Added force-override monkey-patches for:
    • IndirectObject.getObject() → calls get_object()
    • StreamObject.getData() → calls get_data()
    • DecodedStreamObject.getData() → calls get_data()
    • Applied at both PyPDF2.generic._base and PyPDF2.generic levels

3. odoo/addons/base/tests/test_pdf.py

  • Added explicit page copying after cloneReaderDocumentRoot() calls in all test methods
  • This fixes the critical PyPDF2 3.x issue where only document structure is copied, not content pages

Implementation Details

Critical PyPDF2 3.x Fix - Page Content Copying

In PyPDF2 3.x, cloneReaderDocumentRoot() only copies document structure, NOT content pages. This was causing 327-byte PDFs with no actual content. Modules using this method now include explicit page copying:

writer.cloneReaderDocumentRoot(reader)
# Copy all pages from the reader to the writer (required for PyPDF2 3.x)
for page_num in range(reader.getNumPages()):
    page = reader.getPage(page_num)
    writer.addPage(page)

Compatibility Import Pattern

try:
    from PyPDF2 import PdfReader, PdfWriter

    # Create compatibility classes
    class PdfFileWriter(PdfWriter):
        def addPage(self, page):
            return self.add_page(page)

        def addMetadata(self, metadata):
            return self.add_metadata(metadata)

        def _addObject(self, obj):
            return self._add_object(obj)

    class PdfFileReader(PdfReader):
        def getNumPages(self):
            return len(self.pages)

        def getPage(self, page_num):
            return self.pages[page_num]

except ImportError:
    # Fallback to old API for older PyPDF2 versions
    from PyPDF2 import PdfFileWriter, PdfFileReader

# DecodedStreamObject compatibility wrapper
from PyPDF2.generic import DecodedStreamObject as _DecodedStreamObject

class DecodedStreamObject(_DecodedStreamObject):
    """Compatibility wrapper for PyPDF2 3.x DecodedStreamObject"""

    def setData(self, data):
        """Compatibility method for set_data()"""
        if hasattr(self, 'set_data'):
            return self.set_data(data)
        else:
            return super().setData(data)

    def getData(self):
        """Compatibility method for get_data()"""
        if hasattr(self, 'get_data'):
            return self.get_data()
        else:
            return super().getData()

# Monkey-patch PyPDF2 generic objects for compatibility
# CRITICAL: In PyPDF2 3.x, old methods exist but raise DeprecationError
# We MUST override them, not just add them
try:
    import PyPDF2.generic._base as pdf_base

    # Override getObject to call get_object without deprecation warning
    if hasattr(pdf_base.IndirectObject, 'get_object'):
        def _getObject_compat(self):
            return self.get_object()
        # Force override even if getObject exists (it raises DeprecationError in 3.x)
        pdf_base.IndirectObject.getObject = _getObject_compat

    # Also patch in the generic module
    from PyPDF2.generic import IndirectObject
    if hasattr(IndirectObject, 'get_object'):
        IndirectObject.getObject = _getObject_compat

except (ImportError, AttributeError):
    pass

try:
    from PyPDF2.generic import StreamObject

    # Override getData to call get_data without deprecation warning
    if hasattr(StreamObject, 'get_data'):
        def _getData_compat(self):
            return self.get_data()
        # Force override even if getData exists (it raises DeprecationError in 3.x)
        StreamObject.getData = _getData_compat
except (ImportError, AttributeError):
    pass

Key Points for Successful Patching

  1. Patch at Base Module Level: Import PyPDF2.generic._base and patch classes there
  2. Force Override: Don't check if method exists - always override in PyPDF2 3.x
  3. Double Patch: Patch both _base module and generic module
  4. Early Application: Apply patches at module import time, before any PDF objects are created
  5. Error Handling: Use (ImportError, AttributeError) to handle both missing modules and attributes

Method Compatibility Mapping

Old Method (PyPDF2 < 3.0) New Method (PyPDF2 ≥ 3.0) Compatibility Method
PdfFileWriter.addPage() PdfWriter.add_page() Wrapped
PdfFileWriter.addMetadata() PdfWriter.add_metadata() Wrapped
PdfFileWriter._addObject() PdfWriter._add_object() Wrapped
PdfFileReader.getNumPages() len(PdfReader.pages) Wrapped
PdfFileReader.getPage() PdfReader.pages[] Wrapped
PdfFileWriter.appendPagesFromReader() PdfWriter.append_pages_from_reader() Wrapped
PdfFileWriter.cloneReaderDocumentRoot() PdfWriter.clone_reader_document_root() Wrapped
DecodedStreamObject.setData() DecodedStreamObject.set_data() Wrapped
DecodedStreamObject.getData() DecodedStreamObject.get_data() Wrapped
StreamObject.getData() StreamObject.get_data() Monkey-patched
IndirectObject.getObject() IndirectObject.get_object() Monkey-patched

Testing

The patch has been successfully tested with:

  • PyPDF2 3.0.1 (new API with deprecation errors)
  • PyPDF2 2.x (old API via fallback)
  • OdooPdfFileWriter instantiation
  • PDF generation workflows
  • Report generation (original error case)
  • PDF attachment operations (account_edi_ubl_cii module)
  • All deprecated method calls now work without errors

Test Results

All PyPDF2 deprecation errors resolved:

  • PdfFileWriter → Working
  • PdfFileReader → Working
  • setData() → Working
  • getData() → Working
  • getObject() → Working
  • PDF report generation → Working
  • PDF attachments → Working

Branch Information

  • Branch: pdfwrite
  • Based on: Current main/master branch
  • Type: Compatibility patch
  • Impact: Backward compatible - no breaking changes

Author

  • Developer: Ernad Husremović (hernad@bring.out.ba)
  • Company: bring.out.doo Sarajevo
  • Date: 2025-09-02

This patch resolves the PyPDF2 deprecation error encountered in:

  • Report generation (/report/pdf/ endpoints)
  • PDF merge operations
  • PDF attachment handling
  • Account EDI PDF operations

Troubleshooting

If you still get DeprecationError after applying the patch:

  1. Check Module Load Order: Ensure odoo/tools/pdf.py is loaded before any PDF operations
  2. Verify Monkey-Patch Application: The patches must be applied at module import time
  3. Check PyPDF2 Version: Run python3 -c "import PyPDF2; print(PyPDF2.__version__)"
  4. Restart Server Completely: Use a full server restart, not just a module reload
  5. Check for Multiple PyPDF2 Installations: Ensure only one PyPDF2 version is installed

Common Issues:

Issue: getObject is deprecated and was removed

  • Cause: Monkey-patch not applied or overridden by later imports
  • Solution: Ensure patches are at module level, not inside functions

Issue: setData is deprecated and was removed

  • Cause: Using original DecodedStreamObject instead of wrapper
  • Solution: Ensure wrapper class is used for all DecodedStreamObject instances

Issue: Empty PDFs (327 bytes)

  • Cause: cloneReaderDocumentRoot() doesn't copy pages in PyPDF2 3.x
  • Solution: Always add explicit page copying after cloneReaderDocumentRoot() calls

Future Considerations

While this patch provides immediate compatibility, consider:

  1. Eventually migrating to the new PyPDF2 API directly
  2. Monitoring PyPDF2 changelog for future deprecations
  3. Testing with future PyPDF2 versions
  4. Consider migrating to pypdf (the successor to PyPDF2) when stable

Installation

This patch is automatically applied when using the pdfwrite branch. No additional installation steps required.