Improving Code Generation via LLMs – summawise

Using LLMs for code generation has been slowly but surely become increasingly common. Using OpenAI (GPT-4 or GPT-4o at the time of writing) I’m often disappointed with the quality of the results that are offered by default. By providing more information to the LLM which can be stored in the context window and accessed via vector embeddings, the results can be improved somewhat notably. This is a concept I recently explored in an open-source program I developed entitled summawise. By default, the tool will allow you to interact with a provided source by embedding the data in a vector store when establishing a prompt session (also referred to as a “thread” on the OpenAI platform). The decision making of the model is based on the concept of an OpenAI “assistant”, and new assistants can be created to improve the models outputs in accordance with your needs. The following inputs are supported::

  • Local files. (Any type of content, file will be uploaded byte for byte)
  • Local directories. (Includes files in nested directories)]\
  • YouTube video URLs. (Transcript is extracted and used as text)
  • Other URLs, depending on the response content. (Text content, PDF files, and HTML are all supported)

The ability to establish a thread with data the model can access outside of the immediate context window (and ability to summarise the data or generate code from it) is accomplished by implementing OpenAI’s API powerful models, and then expanding ChatGPT’s default capabilities by using features that are currently in OpenAI beta. Afterwards you can explore the data further or generate code/information using an interactive prompt in your CLI very similar to web-based LLM clients.


Installation

Summawise is accessible via PyPI for convenience:

pip install --upgrade summawise

Key Features and Usage

The tool is designed to be user-friendly, integrating easily into various workflows. It supports a range of inputs, including YouTube video URLs for transcript extraction, local files and directories, and text/pdf/html content from other URLs.

summawise --help  # Displays help information and usage instructions

While summawise aims to simplify the process of summarizing and analyzing large amounts of data, it’s still in the development phase and may not always deliver perfect results.


API Resources


Conclusion

Summawise is a practical tool that seeks to improve how developers and researchers handle and summarize large datasets and complex information. It’s a step forward in making powerful language models more accessible and useful in everyday tasks. For continuous updates and more information, you can visit the summawise GitHub repository or the PyPI package page.

Leave a Reply

Your email address will not be published. Required fields are marked *