Getting past reCAPTCHA v2 (by using 2Captcha)

Web scraping is more important now than ever. Gathering and selling data can be lucrative, and analyzing it can help businesses in any industry. As web scraping becomes more and more common, measures to stop bots have evolved. Google’s reCAPTCHA v2 is a great example of this. In this article, I will briefly describe how this system works, and how we can bypass it.

First, lets analyze the code of this ReCAPTCHA demo that Google provides.

<form id="recaptcha-demo-form" method="POST">
   <fieldset>
      <legend>Sample Form with ReCAPTCHA</legend>
      <ul>
         <li><label for="input1">First Name</label><input class="jfk-textinput" id="input1" name="input1" type="text" value="Jane" disabled aria-disabled="true"></li>
         <li><label for="input2">Last Name</label><input class="jfk-textinput" id="input2" name="input2" type="text" value="Smith" disabled aria-disabled="true"></li>
         <li><label for="input3">Email</label><input class="jfk-textinput" id="input3" name="input3" type="text" value="stopallbots@gmail.com" disabled aria-disabled="true"></li>
         <li>
            <p>Pick your favorite color:</p>
            <label class="jfk-radiobutton-label" for="option1"><input class="jfk-radiobutton-checked" type="radio" id="option1" name="radios" value="option1" disabled aria-disabled="true" checked aria-checked="true">Red</label><label class="jfk-radiobutton-label" for="option2"><input class="jfk-radiobutton" type="radio" id="option2" name="radios" value="option2" disabled aria-disabled="true">Green</label>
         </li>
         <li>
            <div class="">
               <!-- BEGIN: ReCAPTCHA implementation example. -->
               <div id="recaptcha-demo" class="g-recaptcha" data-sitekey="6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-" data-callback="onSuccess"></div>
               <script nonce="1KeZ0LVg4aug07anmgAULA">
                  var onSuccess = function(response) {
                    var errorDivs = document.getElementsByClassName("recaptcha-error");
                    if (errorDivs.length) {
                      errorDivs[0].className = "";
                    }
                    var errorMsgs = document.getElementsByClassName("recaptcha-error-message");
                    if (errorMsgs.length) {
                      errorMsgs[0].parentNode.removeChild(errorMsgs[0]);
                    }
                    };
               </script><!-- Optional noscript fallback. -->
               <noscript>
                  <div style="width: 302px; height: 462px;">
                     <iframe src="/recaptcha/api/fallback?k=6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-" frameborder="0" scrolling="no"></iframe>
                     <div><textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response"></textarea></div>
                  </div>
                  <br>
               </noscript>
               <!-- END: ReCAPTCHA implementation example. -->
            </div>
         </li>
         <li><input id="recaptcha-demo-submit" type="submit" value="Submit"/></li>
      </ul>
   </fieldset>
</form>

The element we’ll be focusing on is the div with the “g-recaptcha” class. This widget is used in nearly every reCAPTCHA v2 implementation, and further documentation about it can be found here. This element contains an attribute named data-sitekey, and it’s used to generate a response token (denoted by ‘g-recaptcha-response’ in the request) to verify that the captcha was solved when submitting a request. Here’s a sample request in which I’ve solved the captcha on the demo and submitted the form:

Our goal is to get a valid recaptcha response token without any human interaction. This can be done using captcha services such as 2Captcha, which I’ll be using in this example. We simply provide their API with the sitekey and the page url, and they’ll give us a valid token. We’ll need the page url because the domain that the token was generated on is used to verify the token. This prevents you from simply copying the sitekey variable into your own recaptcha implementation and generating tokens that work on another website. This can obviously be bypassed by redirected traffic locally on a subdomain via the hosts file, but that’s a topic for a different day. 2Captcha will handle all of this work for us, and it’s very inexpensive. At the time of me writing this, they charge $2.99/1000 reCAPTCHA tokens, which converts to about 3/10ths of a penny per captcha. Their API is documented here.

A Basic Python Implementation

import requests
import json
import time

api_key = ""
api_url = " https://2captcha.com/"

def api_request(path, data):

    data["key"] = api_key
    response = requests.post(api_url + path, data = data)

    if response.status_code != 200: return False

    response_data = json.loads(response.text)
    if response_data["status"] != 1: return False

    return response_data["request"]

def check_request(id):

    data = {
        "action": "get",
        "id": id,
        "json": 1
    }

    return api_request("res.php", data)

def create_request(url, site_key):

    data = {
        "method": "userrecaptcha",
        "googlekey": site_key,
        "pageurl": url,
        "json": 1
    }

    return api_request("in.php", data)

def solve_captcha(url, site_key):

    id = create_request(url, site_key)

    time.sleep(10)
    request = False
    loops = 0

    while not request and loops < 50:
        time.sleep(5)
        request = check_request(id)
        loops += 1

    return request

This should be pretty easy to implement. Calling the solve_captcha script with the page url and the data-sitekey value and it should return a valid token. Note that due to the nature of the API, we’ll need to wait for a token. This implementation will freeze the thread. If that’s a big deal, implement a simple callback and some multi-threading. Here’s a simple implementation to get a valid token for google’s demo page:

import _2captcha

url = "https://www.google.com/recaptcha/api2/demo"
sitekey = "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-"
token = _2captcha.solve_captcha(url, sitekey)
print(token)

If your API key is valid and nothing went wrong on 2Captcha’s end, it should spit out a token. This typically takes 30 seconds to a minute, so be patient. Submit that token as the correct parameter (g-recaptcha-response) to a captcha protected page, and you’ll be able to get around it instantly.

Author:

8 thoughts on “Getting past reCAPTCHA v2 (by using 2Captcha)”

Leave a Reply to no Cancel reply

Your email address will not be published. Required fields are marked *