document-parsers
CommunityUnlock any document's content.
Data & Analytics#data extraction#rag#pdf extraction#text processing#document parsing#unstructured#llamaparse
AuthorHokageZ
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill tackles the challenge of extracting and structuring information locked within various document formats, making data accessible and usable for analysis, RAG, and more.
Core Features & Use Cases
- Multi-Format Parsing: Handles PDFs, DOCX, HTML, and Markdown files.
- Advanced Extraction: Supports text, tables, and metadata extraction.
- AI-Powered Options: Integrates with LlamaParse for superior accuracy on complex documents.
- RAG Ready: Includes tools for document chunking suitable for embedding.
- Use Case: You need to build a RAG system using a collection of research papers (PDFs) and technical documentation (HTML, DOCX). This Skill provides the tools to parse all these documents, extract relevant text and tables, and chunk them appropriately for your vector database.
Quick Start
Use the document-parsers skill to extract all text and tables from the file 'report.pdf'.
Dependency Matrix
Required Modules
pypdf2pdfplumberpython-docxbeautifulsoup4lxmlunstructured[local-inference]pytesseractpdf2imagellama-parsellama-index-core
Components
scriptsreferencestemplatesexamples
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: document-parsers Download link: https://github.com/HokageZ/JOB-HUNTER/archive/main.zip#document-parsers Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.