Features
- OutofDateDatasets provides outOfDateDatasets.html
Run the following command to install the necessary libraries:
{bash}
pip install requests selenium webdriver_manager
- Setup WebDriver and Helper Functions
Selenium is a tool that allows automated control of web browsers (Chrome, Firefox, Edge, etc). Here, we use Seleminum to launch the ERDDAP login page, handle Google login, and retrieve the session cookies.
webdriver_manager
helps to automatically download and set up the appropriate WebDriver (in this case Chrome).
Helper Functions
To access the ERDDAP dataset, we define three helper functions:
get_browser_cookies(login_url)
: Opens Chrome to log into the ERDDAP server and retrieves the session cookies.authenticate_session(login_url)
: Uses the cookies from get_browser_cookies to create an authenticated session.download_data(session, data_url, outfile)
: Uses the authenticated session to access the dataset URL and save it to the specified output file.
Code
Here is the code with comments for each part:
{python}
import requests # For managing sessions and HTTP requests
from selenium import webdriver # For browser control
from webdriver_manager.chrome import ChromeDriverManager # To manage the browser driver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options
import time
def get_browser_cookies(url):
"""Retrieve cookies from the browser using Selenium."""
# Set Chrome options to suppress automation messages
chrome_options = Options()
chrome_options.add_argument("--disable-blink-features=AutomationControlled") # Disable automation message
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"]) # Hide "Chrome is controlled" message
chrome_options.add_experimental_option("useAutomationExtension", False) # Disable the default automation extension
# Start the Chrome browser using webdriver-manager
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=chrome_options)
try:
# Navigate to the login page
driver.get(url)
# Wait for the user to complete login (adjust time as needed)
time.sleep(60)
# Retrieve cookies from the browser session
cookies = driver.get_cookies()
# Verify cookies were retrieved successfully
if not cookies:
raise ValueError(
"No cookies retrieved. Possible causes:\n"
"- Insufficient sleep time; try increasing the sleep duration.\n"
"- Incompatible or missing WebDriver for Chrome."
)
# Format cookies for session headers
formatted_cookies = "; ".join([f"{cookie['name']}={cookie['value']}" for cookie in cookies])
print("Cookies Retrieved. Attempting to Access ERDDAP Dataset..")
finally:
# Close the browser
driver.quit()
return formatted_cookies
def download_data(session, file_url, output_filename):
"""Download data file using the authenticated session."""
response = session.get(file_url)
if response.status_code == 200:
print("Successfully downloaded data.")
# Write the file content to disk
with open(output_filename, 'wb') as f:
f.write(response.content)
print(f"Data saved to {output_filename}")
else:
print(f"Failed to download data. Status code: {response.status_code}")
print("Response:", response.text)
def authenticate_session(url):
"""Authenticate session and return the session object."""
session = requests.Session()
# Get cookies from ERDDAP login page using Selenium
try:
cookie_header = get_browser_cookies(url)
except ValueError as e:
print(e)
exit(1)
# Set headers with cookies for authenticated requests
session.headers.update({
'User-Agent': 'Mozilla/5.0',
'Cookie': cookie_header
})
return session
# Main Execution
if __name__ == "__main__":
# ERDDAP login URL (through Google login)
login_url = "https://polarwatch.noaa.gov/erddap/loginGoogle.html"
# ERDDAP data URL (direct link to dataset in .nc format)
data_url = [YOUR_ERDDAP_DATASET_URL]
# Step 1: Authenticate Session
session = authenticate_session(login_url)
# Step 2: Download the data file and save as YOUR FILENAME
download_data(session, data_url, [FILENAME])
Step-by-Step Usage
- Install Required Libraries: Make sure requests, selenium, and webdriver_manager are installed.
- Set Login URL and Data URL:
- The login_url is the ERDDAP login page (usually loginGoogle.html for Google login).
- The data_url points to the specific dataset you want to download.
- Run the Script:
- Run the script to open Chrome, log in to ERDDAP, retrieve cookies, and download the specified dataset.
- Adjust time.sleep(60) in get_browser_cookies to allow enough time for login if needed.
Usage Notes
- Browser and Driver Compatibility: Ensure that ChromeDriver is compatible with your installed Chrome version. webdriver_manager handles this automatically, but Chrome must be installed.
- Alternative Browsers: This guide uses Chrome for simplicity, but Firefox and Edge are also supported with minor adjustments to the code.
- Error Handling: The script checks if cookies were successfully retrieved. If not, it suggests potential issues.