This guideline shows how to automate access to a private dataset on ERDDAP that requires Google login using Chrome.
Prerequisites
- Install Required Python Libraries
Run the following command to install the necessary libraries:
{bash}
pip install requests selenium webdriver_manager
- Setup WebDriver and Helper Functions
Selenium is a tool that allows automated control of web browsers, such as Chrome, Firefox, and Edge. Here, we use it to navigate to the ERDDAP login page, handle Google login , and retrieve the authentication cookies.
webdriver_manager
helps to automatically download and set up the appropriate WebDriver for Chrome.
Helper Functions
To access the ERDDAP dataset, we define three helper functions:
get_browser_cookies(login_url): Opens Chrome to log into the ERDDAP server and retrieves the login cookies. authenticate_session(login_url): Uses the cookies from get_browser_cookies to create an authenticated session. download_data(session, data_url, outfile): Uses the authenticated session to access the dataset URL and save it to the specified output file.
Code
Here is the code with comments for each part:
{python}
import requests # For managing sessions and HTTP requests
from selenium import webdriver # For browser control
from webdriver_manager.chrome import ChromeDriverManager # To manage the browser driver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options
import time
def get_browser_cookies(url):
"""Retrieve cookies from the browser using Selenium."""
# Set Chrome options to suppress automation messages
chrome_options = Options()
chrome_options.add_argument("--disable-blink-features=AutomationControlled") # Disable automation message
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"]) # Hide "Chrome is controlled" message
chrome_options.add_experimental_option("useAutomationExtension", False) # Disable the default automation extension
# Start the Chrome browser using webdriver-manager
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=chrome_options)
try:
# Navigate to the login page
driver.get(url)
# Wait for the user to complete login (adjust time as needed)
time.sleep(60)
# Retrieve cookies from the browser session
cookies = driver.get_cookies()
# Verify cookies were retrieved successfully
if not cookies:
raise ValueError(
"No cookies retrieved. Possible causes:\n"
"- Insufficient sleep time; try increasing the sleep duration.\n"
"- Incompatible or missing WebDriver for Chrome."
)
# Format cookies for session headers
formatted_cookies = "; ".join([f"{cookie['name']}={cookie['value']}" for cookie in cookies])
print("Cookies Retrieved. Attempting to Access ERDDAP Dataset..")
finally:
# Close the browser
driver.quit()
return formatted_cookies
def download_data(session, file_url, output_filename):
"""Download data file using the authenticated session."""
response = session.get(file_url)
if response.status_code == 200:
print("Successfully downloaded data.")
# Write the file content to disk
with open(output_filename, 'wb') as f:
f.write(response.content)
print(f"Data saved to {output_filename}")
else:
print(f"Failed to download data. Status code: {response.status_code}")
print("Response:", response.text)
def authenticate_session(url):
"""Authenticate session and return the session object."""
session = requests.Session()
# Get cookies from ERDDAP login page using Selenium
try:
cookie_header = get_browser_cookies(url)
except ValueError as e:
print(e)
exit(1)
# Set headers with cookies for authenticated requests
session.headers.update({
'User-Agent': 'Mozilla/5.0',
'Cookie': cookie_header
})
return session
# Main Execution
if __name__ == "__main__":
# ERDDAP login URL (through Google login)
login_url = "https://polarwatch.noaa.gov/erddap/loginGoogle.html"
# ERDDAP data URL (direct link to dataset in .nc format)
data_url = [YOUR_ERDDAP_DATASET_URL]
# Step 1: Authenticate Session
session = authenticate_session(login_url)
# Step 2: Download the data file and save as YOUR FILENAME
download_data(session, data_url, [FILENAME])
Step-by-Step Usage
- Install Required Libraries: Make sure requests, selenium, and webdriver_manager are installed.
- Set Login URL and Data URL:
- The login_url is the ERDDAP login page (usually loginGoogle.html for Google login).
- The data_url points to the specific dataset you want to download.
- Run the Script:
- Run the script to open Chrome, log in to ERDDAP, retrieve cookies, and download the specified dataset.
- Adjust time.sleep(60) in get_browser_cookies to allow enough time for login if needed.
Usage Notes
- Browser and Driver Compatibility: Ensure that ChromeDriver is compatible with your installed Chrome version. webdriver_manager handles this automatically, but Chrome must be installed.
- Alternative Browsers: This guide uses Chrome for simplicity, but Firefox and Edge are also supported with minor adjustments to the code.
- Error Handling: The script checks if cookies were successfully retrieved. If not, it suggests potential issues.