Using LLMs for code generation has been slowly but surely become increasingly common. Using OpenAI (GPT-4 or GPT-4o at the time of writing) I’m often disappointed with the quality of the results that are offered by default. By providing more information to the LLM which can be stored in the context window and accessed via vector embeddings, the results can be improved somewhat notably. This is a concept I recently […]
Getting past reCAPTCHA v2 (by using 2Captcha)
Web scraping is more important now than ever. Gathering and selling data can be lucrative, and analyzing it can help businesses in any industry. As web scraping becomes more and more common, measures to stop bots have evolved. Google’s reCAPTCHA v2 is a great example of this. In this article, I will briefly describe how this system works, and how we can bypass it. First, lets analyze the code of […]
Showing progress of GET/PUT using ‘requests’ & ‘clint’
Uploading/downloading large files can be tedious, especially when you’re unable to view the progress and status of the request. Using the requests library alongside clint, it’s easy to visually display progress in a console application. We’ll be able to specify the chunk size and monitor the download speed. In the snippets below, you’ll see a function ‘progress.bar’ called. This is contained inside if the clint.textui module, and it’s the function […]
Creating a ‘requests’ session from a selenium web driver
Python is frequently used for web scraping. Often times, the ‘requests’ library is sufficient. However, it is typically only used for basic requests. We can send a GET request to a website, but what if the actual page is loaded via javascript? Using a real browser/web driver allows us to load the page completely. Instead of simply sending a request to a url, we can automatically execute scripts and download […]