{"id":91,"date":"2019-07-04T01:14:14","date_gmt":"2019-07-04T01:14:14","guid":{"rendered":"http:\/\/justin.ooo\/?p=91"},"modified":"2023-06-20T03:08:38","modified_gmt":"2023-06-20T03:08:38","slug":"getting-past-recaptcha-v2-by-using-2captcha","status":"publish","type":"post","link":"https:\/\/justin.ooo\/index.php\/2019\/07\/04\/getting-past-recaptcha-v2-by-using-2captcha\/","title":{"rendered":"Getting past reCAPTCHA v2 (by using 2Captcha)"},"content":{"rendered":"\n<p>Web scraping is more important now than ever. Gathering and selling data can be lucrative, and analyzing it can help businesses in any industry. As web scraping becomes more and more common, measures to stop bots have evolved. Google&#8217;s reCAPTCHA v2 is a great example of this. In this article, I will briefly describe how this system works, and how we can bypass it.<\/p>\n\n\n\n<p>First, lets analyze the code of this <a href=\"https:\/\/www.google.com\/recaptcha\/api2\/demo\">ReCAPTCHA demo<\/a> that Google provides.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/justin.ooo\/wp-content\/uploads\/2019\/06\/firefox_JR4VTDNdIq.png\" alt=\"\" class=\"wp-image-95\" width=\"409\" height=\"545\" srcset=\"https:\/\/justin.ooo\/wp-content\/uploads\/2019\/06\/firefox_JR4VTDNdIq.png 452w, https:\/\/justin.ooo\/wp-content\/uploads\/2019\/06\/firefox_JR4VTDNdIq-225x300.png 225w\" sizes=\"(max-width: 409px) 100vw, 409px\" \/><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code lang=\"markup\" class=\"language-markup line-numbers\">&lt;form id=\"recaptcha-demo-form\" method=\"POST\"&gt;\n   &lt;fieldset&gt;\n      &lt;legend&gt;Sample Form with ReCAPTCHA&lt;\/legend&gt;\n      &lt;ul&gt;\n         &lt;li&gt;&lt;label for=\"input1\"&gt;First Name&lt;\/label&gt;&lt;input class=\"jfk-textinput\" id=\"input1\" name=\"input1\" type=\"text\" value=\"Jane\" disabled aria-disabled=\"true\"&gt;&lt;\/li&gt;\n         &lt;li&gt;&lt;label for=\"input2\"&gt;Last Name&lt;\/label&gt;&lt;input class=\"jfk-textinput\" id=\"input2\" name=\"input2\" type=\"text\" value=\"Smith\" disabled aria-disabled=\"true\"&gt;&lt;\/li&gt;\n         &lt;li&gt;&lt;label for=\"input3\"&gt;Email&lt;\/label&gt;&lt;input class=\"jfk-textinput\" id=\"input3\" name=\"input3\" type=\"text\" value=\"stopallbots@gmail.com\" disabled aria-disabled=\"true\"&gt;&lt;\/li&gt;\n         &lt;li&gt;\n            &lt;p&gt;Pick your favorite color:&lt;\/p&gt;\n            &lt;label class=\"jfk-radiobutton-label\" for=\"option1\"&gt;&lt;input class=\"jfk-radiobutton-checked\" type=\"radio\" id=\"option1\" name=\"radios\" value=\"option1\" disabled aria-disabled=\"true\" checked aria-checked=\"true\"&gt;Red&lt;\/label&gt;&lt;label class=\"jfk-radiobutton-label\" for=\"option2\"&gt;&lt;input class=\"jfk-radiobutton\" type=\"radio\" id=\"option2\" name=\"radios\" value=\"option2\" disabled aria-disabled=\"true\"&gt;Green&lt;\/label&gt;\n         &lt;\/li&gt;\n         &lt;li&gt;\n            &lt;div class=\"\"&gt;\n               &lt;!-- BEGIN: ReCAPTCHA implementation example. --&gt;\n               &lt;div id=\"recaptcha-demo\" class=\"g-recaptcha\" data-sitekey=\"6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-\" data-callback=\"onSuccess\"&gt;&lt;\/div&gt;\n               &lt;script nonce=\"1KeZ0LVg4aug07anmgAULA\"&gt;\n                  var onSuccess = function(response) {\n                    var errorDivs = document.getElementsByClassName(\"recaptcha-error\");\n                    if (errorDivs.length) {\n                      errorDivs[0].className = \"\";\n                    }\n                    var errorMsgs = document.getElementsByClassName(\"recaptcha-error-message\");\n                    if (errorMsgs.length) {\n                      errorMsgs[0].parentNode.removeChild(errorMsgs[0]);\n                    }\n                    };\n               &lt;\/script&gt;&lt;!-- Optional noscript fallback. --&gt;\n               &lt;noscript&gt;\n                  &lt;div style=\"width: 302px; height: 462px;\"&gt;\n                     &lt;iframe src=\"\/recaptcha\/api\/fallback?k=6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-\" frameborder=\"0\" scrolling=\"no\"&gt;&lt;\/iframe&gt;\n                     &lt;div&gt;&lt;textarea id=\"g-recaptcha-response\" name=\"g-recaptcha-response\" class=\"g-recaptcha-response\"&gt;&lt;\/textarea&gt;&lt;\/div&gt;\n                  &lt;\/div&gt;\n                  &lt;br&gt;\n               &lt;\/noscript&gt;\n               &lt;!-- END: ReCAPTCHA implementation example. --&gt;\n            &lt;\/div&gt;\n         &lt;\/li&gt;\n         &lt;li&gt;&lt;input id=\"recaptcha-demo-submit\" type=\"submit\" value=\"Submit\"\/&gt;&lt;\/li&gt;\n      &lt;\/ul&gt;\n   &lt;\/fieldset&gt;\n&lt;\/form&gt;<\/code><\/pre>\n\n\n\n<p>The element we&#8217;ll be focusing on is the div with the &#8220;g-recaptcha&#8221; class. This widget is used in nearly every reCAPTCHA v2 implementation, and further documentation about it can be found <a href=\"https:\/\/developers.google.com\/recaptcha\/docs\/display\">here<\/a>. This element contains an attribute named data-sitekey, and it&#8217;s used to generate a response token (denoted by &#8216;g-recaptcha-response&#8217;  in the request) to verify that the captcha was solved when submitting a request. Here&#8217;s a sample request in which I&#8217;ve solved the captcha on the demo and submitted the form:<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i.imgur.com\/XPTjpca.png\" alt=\"\" width=\"970\" height=\"145\"\/><\/figure>\n\n\n\n<p>Our goal is to get a valid recaptcha response token without any human interaction. This can be done using captcha services such as 2Captcha, which I&#8217;ll be using in this example. We simply provide their API with the sitekey and the page url, and they&#8217;ll give us a valid token. We&#8217;ll need the page url because the domain that the token was generated on is used to verify the token. This prevents you from simply copying the sitekey variable into your own recaptcha implementation and generating tokens that work on another website. This can obviously be bypassed by redirected traffic locally on a subdomain via the hosts file, but that&#8217;s a topic for a different day. 2Captcha will handle all of this work for us, and it&#8217;s very inexpensive. At the time of me writing this, they charge $2.99\/1000 reCAPTCHA tokens, which converts to about 3\/10ths of a penny per captcha. Their API is documented <a href=\"https:\/\/2captcha.com\/2captcha-api\">here<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A Basic Python Implementation<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python line-numbers\">import requests\nimport json\nimport time\n\napi_key = \"\"\napi_url = \" https:\/\/2captcha.com\/\"\n\ndef api_request(path, data):\n    data[\"key\"] = api_key\n    response = requests.post(api_url + path, data = data)\n    if response.status_code != 200: return False\n    response_data = json.loads(response.text)\n    if response_data[\"status\"] != 1: return False\n    return response_data[\"request\"]\n\ndef check_request(id):\n    data = {\n        \"action\": \"get\",\n        \"id\": id,\n        \"json\": 1\n    }\n    return api_request(\"res.php\", data)\n\ndef create_request(url, site_key):\n    data = {\n        \"method\": \"userrecaptcha\",\n        \"googlekey\": site_key,\n        \"pageurl\": url,\n        \"json\": 1\n    }\n    return api_request(\"in.php\", data)\n\ndef solve_captcha(url, site_key):\n    id = create_request(url, site_key)\n    time.sleep(10)\n    request = False\n    loops = 0\n    while not request and loops &amp;lt; 50:\n        time.sleep(5)\n        request = check_request(id)\n        loops += 1\n    return request<\/code><\/pre>\n\n\n\n<p>This should be pretty easy to implement. Calling the solve_captcha script with the page url and the data-sitekey value and it should return a valid token. Note that due to the nature of the API, we&#8217;ll need to wait for a token. This implementation will freeze the thread. If that&#8217;s a big deal, implement a simple callback and some multi-threading. Here&#8217;s a simple implementation to get a valid token for google&#8217;s demo page:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python line-numbers\">import _2captcha\nurl = \"https:\/\/www.google.com\/recaptcha\/api2\/demo\"\nsitekey = \"6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-\"\ntoken = _2captcha.solve_captcha(url, sitekey)\nprint(token)<\/code><\/pre>\n\n\n\n<p>If your API key is valid and nothing went wrong on 2Captcha&#8217;s end, it should spit out a token. This typically takes 30 seconds to a minute, so be patient. Submit that token as the correct parameter (g-recaptcha-response) to a captcha protected page, and you&#8217;ll be able to get around it instantly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Web scraping is more important now than ever. Gathering and selling data can be lucrative, and analyzing it can help businesses in any industry. As web scraping becomes more and more common, measures to stop bots have evolved. Google&#8217;s reCAPTCHA v2 is a great example of this. In this article, I will briefly describe how this system works, and how we can bypass it. First, lets analyze the code of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[2,3],"tags":[],"_links":{"self":[{"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/posts\/91"}],"collection":[{"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/comments?post=91"}],"version-history":[{"count":25,"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/posts\/91\/revisions"}],"predecessor-version":[{"id":260,"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/posts\/91\/revisions\/260"}],"wp:attachment":[{"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/media?parent=91"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/categories?post=91"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/justin.ooo\/index.php\/wp-json\/wp\/v2\/tags?post=91"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}