By using this site, you agree privacy policies
Accept
Geek RoomGeek RoomGeek Room
  • Home
  • Tech
    TechShow More
    Split Technology Park welcomes first tenants: 26 MPSs and 6 startups
    October 31, 2024
    INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans
    October 31, 2024
    Diaspora 4 Innovation: Kick-off event launches a new era for Albanian higher education
    October 31, 2024
    AI for good: Generative AI – Tirana chapter empowers Albanian Youth in tech innovation
    October 29, 2024
    Business Angel Summit 2024: Pioneering Investment and Startup Growth in Sarajevo
    October 29, 2024
  • Mobile
    MobileShow More
    Xiaomi 15 and 15 Pro set to launch on October 29: Official renders released
    October 24, 2024
    Dangerous virus infects millions of mobile phones through popular apps
    October 3, 2024
    The new iPhone 16 arrives in Croatia with a steep price tag
    September 26, 2024
    Beware of these phone numbers: Block them immediately to avoid scams
    September 11, 2024
    Beyond the brand: What really matters when buying a mobile phone
    September 5, 2024
  • Apps
    AppsShow More
    Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content
    October 31, 2024
    Intel prevails in long-running legal battle against €1 billion EU fine
    October 31, 2024
    New definition of open source artificial intelligence released by OSI
    October 29, 2024
    CaSys introduces “Pay by Link” payment service for SMEs in Macedonia
    October 24, 2024
    Kickstarter surpasses $8 billion in donations across all projects
    October 17, 2024
  • Science
    ScienceShow More
    Sofia Tech Park: A thriving innovation hub for Southeast Europe
    October 29, 2024
    Breakthrough in prostate cancer treatment: Croatian scientists develop Vini, a tool to predict effective drug combinations
    October 24, 2024
    Digital Realty partners with Ecolab to pilot AI-powered water conservation solution
    October 24, 2024
    Sofia Tech Park to host the Southeast European Innovators Challenge Conference
    October 11, 2024
    ACG accelerates European growth with major expansion in Croatia
    October 9, 2024
  • Gaming
    GamingShow More
    “Windblown” – The new game from the creators of Dead Cells
    October 24, 2024
    Kraken Empire’s Journey and the creative brilliance of Toy Tactics
    October 21, 2024
    Serbian game studio Tricoman set to make a mark with their new RPG ‘Godforged’ on Steam
    October 16, 2024
    Release the demon with Kill Knight: A phenomenal combat experience with untapped potential
    October 14, 2024
    Nordeus launches new football game “Top Goal: Football Champion” in Serbia
    October 9, 2024
  • Cars
    CarsShow More
    Serbia signs strategic agreement with Hyundai Engineering for 1 GW of Solar Power
    October 16, 2024
    Stara Zagora: Poised to lead Bulgaria’s automotive revolution
    October 15, 2024
    Dacia unveils new Bigster: The flagship model for the C-SUV segment
    October 9, 2024
    Kineton Albania: Pioneering innovation in the automotive industry
    October 8, 2024
    Albania’s vehicle numbers surge in 2024: 73% of registered cars are over 15 years old
    August 20, 2024
  • Entertainment
    EntertainmentShow More
    Where are Generation Z’s famous tech entrepreneurs?
    October 29, 2024
    AllWeb offers special discounts for startups: A unique opportunity for networking and growth
    October 23, 2024
    Montenegro census reveals no ethnic majority, Montenegrins and Serbs nearly equal
    October 16, 2024
    “Primordial Passion” is the first luxury Albanian watch valued at €1.4 million by Argjendari Pirro
    October 15, 2024
    Albania takes the stage at BIG event Paris: Culture and innovation as economic drivers
    October 12, 2024
Search
Reading: Analyzing documents using LangChain and OpenAI APIv: A comprehensive guide
Notification Show More
Aa
Geek RoomGeek Room
Aa
  • Tech
  • Mobile
  • Apps
  • Science
  • Gaming
  • Cars
  • Entertainment
Search
  • Home
  • Tech
  • Mobile
  • Apps
  • Science
  • Gaming
  • Cars
  • Entertainment
Geek Room > Blog > Apps > Analyzing documents using LangChain and OpenAI APIv: A comprehensive guide
Apps

Analyzing documents using LangChain and OpenAI APIv: A comprehensive guide

Last updated: 2023/09/18 at 9:30 PM
Share
7 Min Read

Informed decision-making often hinges on extracting insights from documents and data. However, when handling sensitive information, privacy concerns become paramount. LangChain, in conjunction with the OpenAI API, empowers you to analyze your local documents without the necessity of uploading them online.

Contents
Preparing your environmentGetting an OpenAI API keyImporting the required librariesLoading the document for analysis

This is accomplished by maintaining your data locally, employing embeddings and vectorization for analysis, and executing processes within your own environment. It’s important to note that OpenAI does not utilize data provided by customers through their API for the purpose of training their models or enhancing their services.

Preparing your environment

To ensure a smooth setup and avoid any conflicts in library versions, follow these steps to create a new Python virtual environment. Once that’s done, use the terminal command below to install the necessary libraries:

pip install langchain openai tiktoken faiss-cpu pypdf

Here’s a breakdown of the purpose of each library in your setup:

  1. LangChain: This library will be your go-to tool for creating and managing linguistic chains, facilitating text processing and analysis. LangChain offers modules for tasks like document loading, text segmentation, embeddings, and vector storage.
  2. OpenAI: You’ll utilize the OpenAI library to execute queries and retrieve results from a language model, enabling you to harness the power of advanced natural language processing.
  3. tiktoken: This library serves the vital role of counting tokens (text units) within a given text. This functionality is crucial when working with the OpenAI API, as it charges based on the number of tokens processed. tiktoken helps you keep track of your token consumption.
  4. FAISS: FAISS comes into play for creating and managing a vector store, offering rapid retrieval of similar vectors based on their embeddings. This is essential for various text similarity and clustering tasks.
  5. PyPDF: PyPDF is a valuable addition to your toolkit, enabling the extraction of text from PDF documents. It simplifies the process of loading PDF files and extracting their textual content for further analysis.

Once you’ve successfully installed these libraries, your environment is now properly configured and ready for your document analysis endeavors.

Getting an OpenAI API key

When making requests to the OpenAI API, it’s essential to include an API key as part of the request. This key serves as a means of verification, enabling the API provider to confirm that the requests originate from a valid source and that you possess the required permissions to access its functionalities. To obtain an OpenAI API key, follow these steps:

  1. Access the OpenAI Platform:
  • Go to the OpenAI platform on their website.
  1. Access Your Account Profile:
  • If you have an OpenAI account, log in. Once logged in, locate your account profile, usually found in the top-right corner of the platform.
  1. View API Keys:
  • Within your account profile, there should be an option to “View API keys.” Click on this option to access the API keys page.
  1. Create a New Secret Key:
  • On the API keys page, you’ll find a button labeled “Create new secret key.” Click on it.
  1. Provide a Name for Your Key:
  • Give your new API key a descriptive name to help you identify its purpose or usage.
  1. Generate the Key:
  • After naming your key, proceed to create it by clicking the “Create new secret key” button once more. OpenAI will generate a unique API key for you.
  1. Copy and Store Securely:
  • Once generated, ensure you copy the API key and store it securely. This key contains sensitive information and should be handled with care. For security reasons, OpenAI won’t display it again through your account. If you ever misplace or lose this secret key, you’ll need to create a new one.

By following these steps, you will obtain a valid OpenAI API key that you can utilize to authenticate your requests to their services. It’s vital to keep this key secure and refrain from sharing it publicly, as it grants access to OpenAI’s capabilities and may result in charges based on your usage.

Importing the required libraries

from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

In order to utilize the libraries that have been installed in your virtual environment, it’s necessary to import them. Take note that you import the required dependency libraries from LangChain. This enables you to access and leverage the specific functionalities provided by the LangChain framework.

Loading the document for analysis

Begin by establishing a variable to store your API key. This variable will be used subsequently in the code for authentication purposes.

# Hardcoded API key
openai_api_key = "Your API key"

It’s a good practice not to hardcode your API key, especially if you intend to share your code with others. For production code that will be distributed, it’s advisable to use environment variables for storing sensitive information like API keys. Next, let’s create a function that loads a document. This function should be capable of loading either a PDF or a text file. If the document provided is neither of these types, the function should raise a ValueError.

def load_document(filename):
   if filename.endswith(".pdf"):
       loader = PyPDFLoader(filename)
       documents = loader.load()
   elif filename.endswith(".txt"):
       loader = TextLoader(filename)
       documents = loader.load()
   else:
       raise ValueError("Invalid file type")

Once you’ve loaded the documents, you can create a CharacterTextSplitter to segment the loaded content into smaller chunks based on characters.

text_splitter = CharacterTextSplitter(chunk_size=1000, 
                                         chunk_overlap=30, separator="\n")

   return text_splitter.split_documents(documents=documents)

Dividing the document into smaller chunks serves the purpose of creating manageable segments while maintaining some degree of context overlap between them. This approach proves beneficial for various tasks, including text analysis and information retrieval.

You Might Also Like

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups

INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans

Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content

Intel prevails in long-running legal battle against €1 billion EU fine

Diaspora 4 Innovation: Kick-off event launches a new era for Albanian higher education

Share This Article
Facebook Whatsapp Whatsapp Copy Link
Previous Article EU to probe the influx of Chinese electric vehicles and consider imposing tariffs
Next Article The latest iOS 17 update of Carrot Weather now allows you to receive weather updates using your own voice

Social networks

Instagram Follow

Latest news

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups
Tech October 31, 2024
INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans
Tech October 31, 2024
Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content
Apps October 31, 2024
Intel prevails in long-running legal battle against €1 billion EU fine
Apps October 31, 2024

Related articles

Tech

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups

October 31, 2024
Tech

INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans

October 31, 2024
Apps

Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content

October 31, 2024
Apps

Intel prevails in long-running legal battle against €1 billion EU fine

October 31, 2024

About us

Geek Room is dedicated to technology and its enthusiasts through real-time information and videos about the latest innovations. Connect with our staff via email at: [email protected]
For cooperation opportunities, write to us at: [email protected]

Find us:

© 2023 Geekroom All Rights Reserved. Developed by MIMS
adbanner
AdBlock Detected
Our site is an advertising supported site. Please whitelist to support our site.
Okay, I'll Whitelist
Welcome Back!

Sign in to your account

Lost your password?