Creating a ‘requests’ session from a selenium web driver

Python is frequently used for web scraping. Often times, the ‘requests’ library is sufficient. However, it is typically only used for basic requests. We can send a GET request to a website, but what if the actual page is loaded via javascript? Using a real browser/web driver allows us to load the page completely. Instead of simply sending a request to a url, we can automatically execute scripts and download resources.

In my case, I’m scraping a website which requires me to be logged in. However, the login page has a variety of security implementations that can’t easily be circumvented with simple HTTP requests. After realizing this, I decided to use a selenium webdriver to complete the login. After logging in, I simply needed the session information (cookies) established by the login request to scrape the rest of the site.

The selenium webdriver objects have a get_cookies function, which returns a list of dicts. Here is a list of the keys in each dictionary, alongside their type and a brief description:

name (string): The name of the cookie.
value (string): The value of the cookie.
domain (string): The domain of the server the cookie is sent to.
path (string): Document location in which cookie is sent. 
secure (bool): Cookie is only sent to the server in encrypted requests.
httpOnly (bool): Prevents the cookie from being accessed through client side scripts.
expiry (int): Unix timestamp indicating when the cookie expires.

Knowing this, here is an example of the result of using json.dumps(list) to serialize the resulting list from get_cookies:

[
  {
    "name": "cookie1",
    "value": "whatever",
    "path": "/",
    "domain": ".justin.ooo",
    "secure": true,
    "httpOnly": false,
    "expiry": 1590978394
  },
  {
    "name": "cookie2",
    "value": "doesn't matter",
    "path": "/",
    "domain": "justin.ooo",
    "secure": true,
    "httpOnly": false,
    "expiry": 1559528794
  }
]

That’s a very simple example, containing 2 meaningless cookies. We need to somehow derive a requests.Session object from this list of dictionaries. Unfortunately, the requests library does not store cookies in simple dicts. Instead, it uses http.cookiejar.Cookie objects.

After some brief searching on the CPython git repo, we can find the http.cookiejar.Cookie class source code & constructor. To effectively copy these cookies, we’ll need to instantiate an instance of the Cookie object from each of our cookie dicts, and then set each of those cookies in the new session. This can be achieved similarly using the requests.cookies.create_cookie function, however I chose to use the standard constructor. My solution is written as follows:

def generate_cookie(cookie_raw):
    """
    Creates a http.cookiejar.Cookie object, given raw cookie information as dict.
    This dict must contain the following keys: name, value, domain, path, secure
    Parameters:
        cookie_raw (dict): The cookie information dictionary.
    Returns:
        http.cookiejar.Cookie: The generated cookie object.
    """
    # expiry is optional, so default it to false if not set
    if not 'expiry' in cookie_raw:
        cookie_raw['expiry'] = False
    # initialize Cookie object
    cookie = http.cookiejar.Cookie(
        0,                      # version
        cookie_raw['name'],     # name
        cookie_raw['value'],    # value
        None,                   # port
        False,                  # port_specified
        cookie_raw['domain'],   # domain
        True,                   # domain_specified
        "",                     # domain_initial_dot
        cookie_raw['path'],     # path
        True,                   # path_specified,
        cookie_raw['secure'],   # secure
        cookie_raw['expiry'],   # expires
        False,                  # discard
        "",                     # comment
        "",                     # comment_url
        None,                   # rest
        )
    return cookie

This block of code generates a single http.cookiejar.Cookie object from a dict. Logically, we can complete our goal by simply

  • Getting the list of raw cookies (dict) from our driver.
  • Calling generate_cookie(dict) on each of those items to generate a http.cookielib.Cookie object.
  • Setting these http.cookielib.Cookie objects as cookies in a requests.Session instance using the set_cookie(cookie) method.

Here is a working implementation of this logic, alongside a usage example:

def session_from_driver(browser):
    """
    Creates a 'requests.Session' object to make requests from.
    Automatically copies cookies from selenium driver into new session.
    Parameters:
        browser (selenium.webdriver): An instance of a selenium webdriver.
    Returns:
        requests.Session: A session containing cookies from the provided selenium.webdriver object.
    """
    cookies = browser.get_cookies()
    session = requests.session()
    for cookie_raw in cookies:
        cookie = generate_cookie(cookie_raw)
        session.cookies.set_cookie(cookie)
from selenium import webdriver
import utils
# initialize our webdriver
browser = webdriver.Firefox() # note: geckodriver is needed for this
# load the websites login page
browser.get("https://example.com/login.php")
# fill out username/password, click the 'submit' button
browser.find_element_by_id("username").send_keys("justin")
browser.find_element_by_id("password").send_keys("testing123")
browser.find_element_by_xpath("//input[@type='submit']").click()
# use our session_from_driver function to create a requests.Session
session = utils.session_from_driver(browser)
# access data that should only be available if we're logged in!
response = session.get("https://example.com/profile.php")
print(response)

This solution allows us to use of a real browser when needed, and seamlessly switch to the requests library to send standard HTTP requests while retaining session information.

3 thoughts on “Creating a ‘requests’ session from a selenium web driver

Leave a Reply

Your email address will not be published. Required fields are marked *