File size: 2,228 Bytes
a30fe61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import streamlit as st
import docx
import PyPDF2
from transformers import pipeline
import tempfile

# Load Hugging Face model
@st.cache_resource
def load_pipeline():
    return pipeline("question-answering", model="deepset/roberta-base-squad2")

qa_pipeline = load_pipeline()

def read_pdf(file):
    text = ""
    pdf_reader = PyPDF2.PdfReader(file)
    for page in pdf_reader.pages:
        text += page.extract_text() + "\n"
    return text

def read_word(file):
    doc = docx.Document(file)
    text = ""
    for para in doc.paragraphs:
        text += para.text + "\n"
    return text

def extract_text(uploaded_file):
    file_type = uploaded_file.name.split('.')[-1].lower()
    if file_type == 'pdf':
        text = read_pdf(uploaded_file)
    elif file_type == 'docx':
        text = read_word(uploaded_file)
    else:
        st.error("Unsupported file type. Please upload a PDF or Word file.")
        text = None
    return text

# Streamlit interface
def main():
    st.title("📄 File Reader & Hugging Face Q&A Application")
    st.write("Upload a PDF or Word file and ask questions based on its content.")

    # File upload
    uploaded_file = st.file_uploader("Choose a PDF or Word file", type=["pdf", "docx"])

    if uploaded_file is not None:
        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file.write(uploaded_file.read())
            temp_file_path = temp_file.name

        # Extract and display text
        file_text = extract_text(temp_file_path)
        if file_text:
            st.text_area("File Content", file_text[:1000] + "... (truncated for display)")

            # Question-answering
            question = st.text_input("Ask a question based on the file content:")

            if st.button("Get Answer"):
                if question.strip():
                    try:
                        result = qa_pipeline(question=question, context=file_text)
                        st.success(f"Answer: {result['answer']}")
                    except Exception as e:
                        st.error(f"Error generating answer: {str(e)}")
                else:
                    st.warning("Please enter a question.")

if __name__ == "__main__":
    main()