Download Sample Word Files (DOCX / DOCM)
Download free sample Word documents. The modern Office Open XML format is actually a zipped archive of XML files. Use these files to test text extraction, layout fidelity, and security handling of VBA Macros. STANDARD Documents & Reports File Name Contents / Specs Size Action file-sample_100kB.docx Standard Text Lorem Ipsum text with basic formatting (Bold,…
Download free sample Word documents. The modern Office Open XML format is actually a zipped archive of XML files. Use these files to test text extraction, layout fidelity, and security handling of VBA Macros.
STANDARD
Documents & Reports
| File Name | Contents / Specs | Size | Action |
|---|---|---|---|
| file-sample_100kB.docx Standard Text |
Lorem Ipsum text with basic formatting (Bold, Italic, Lists). The baseline for any parser. | 100 KB | Download |
| complex_layout.docx Rich Media |
Contains embedded images, nested tables, and headers/footers. Tests layout preservation. | 1 MB | Download |
| 100_pages_thesis.docx Performance |
Heavy text content with Table of Contents (TOC). Use to test load times and pagination. | 2.5 MB | Download |
QA / SECURITY
Macros, Revisions & Corruption
| Test Case | Description | Size | Action |
|---|---|---|---|
| Macro Enabled (.docm) | Security Risk. Contains a VBA Macro (Visual Basic). Often blocked by email servers. Tests your file upload security filters. | 150 KB | Download |
| Track Changes (Redlines) | Contains hidden revision history (deleted text that is still in the file). Vital for testing metadata scrubbing. | 50 KB | Download |
| Corrupted (Bad XML) | The internal `document.xml` is malformed. Word will display an error when opening this. | 10 KB | Download |
Technical Specs: DOCX vs DOCM
- The ZIP Trick: A `.docx` file is just a ZIP archive. Rename it to `.zip` and extract it to see the XML structure. The text is stored in
word/document.xml. - Macros: A standard `.docx` cannot save macros. If you try to save VBA code in a `.docx`, Word will strip it out. You must use the `.docm` extension for files with macros.
- MIME Type:
application/vnd.openxmlformats-officedocument.wordprocessingml.document.
Frequently Asked Questions
They contain executable code (VBA). Hackers use them to download malware when you click “Enable Editing”. QA teams use them to verify that their application rejects dangerous file types.
If you open an old `.doc` file in modern Word, it runs in Compatibility Mode to ensure the layout doesn’t break. You can convert it to `.docx` via “File > Info > Convert”, but layout shifts may occur.
How to fix a corrupted DOCX?
Since DOCX is XML-based, you can often fix corruption manually if Word fails to open the file.
- The ZIP Method: Rename file.docx to file.zip. Open it. Find
word/document.xml. This is where your text lives. You can recover the text even if the file structure is broken. - LibreOffice Writer: Often opens corrupted Word files that Microsoft Word rejects, because its XML parser is more permissive.
- Pandoc: A command-line tool that can convert DOCX to Markdown or HTML, stripping out complex formatting that might be causing errors.
Developer’s Corner: Text Extraction
Don’t try to parse the XML manually. Use python-docx to safely read text, tables, and headers.
import docxdoc = docx.Document(‘contract.docx’)
full_text = []
for para in doc.paragraphs:
full_text.append(para.text)
print(‘\n’.join(full_text))
