Poor correlation between OCN sentinel 1 wind speed mesurements and ERA5

ufcgeo · December 14, 2023, 1:51pm

Hello, i’ve been trying to calculate the linear correlation between the ERA5(28 km resolution) global wind speeds,and metop level 2 wind speed data (12.5km resolution), with the ocean wind field product from OCN level2 sentinel 1 dataset(1 km resolution).First i interpolate the two products and then select the closest points to the owi product to calculate the correlation.Im not altering the spatial resolution to match both products, simply selecting the closest point and calculating the correlation trough python.The results so far were 0.46, 0.32 and -0.7.It seems to be going very poorly for me, could someone give me an indication of how i could improve my method of analysis?

This is the code i’ve been using:

import geopandas as gpd
import pandas as pd
import numpy as np

#calculating the wind speed using the u and v component from the ERA5 file
def uv2idX(v, u):
    i = np.sqrt(u**2 + v**2)  # Calculate the intensity
    di = np.arctan2(v, u) * 180 / np.pi  # Calculate the direction in degrees
    di = (di + 360) % 360  #Ajust to the interval  fom 0 to 360 graus
    d = (90 - di) % 360  # Ajust to the interval  fom 0 to 360 graus
    return i, d

# Transform xarray Dataset in pandas DataFrame
ds_df = ds.to_dataframe()

# Reset the index of the DataFrame and adjust the longitude for an interval between -180 and 180
ds_df = ds_df.reset_index()
ds_df['longitude'] = (ds_df['longitude'] + 180) % 360 - 180

# Create a GeoDataFrame for geometry based on 'longitude' and 'latitude'
gdf = gpd.GeoDataFrame(ds_df, geometry=gpd.points_from_xy(ds_df['longitude'], ds_df['latitude']))

# Defining limits for data selection
lon_min, lon_max, lat_min, lat_max = -70, -10, -40, 10

# Selecting the data within those limits
selected_data = gdf[
    ((gdf['latitude'] >= lat_min) & (gdf['latitude'] <= lat_max)) &
    ((gdf['longitude'] >= lon_min) & (gdf['longitude'] <= lon_max))
]

# Calculate the wind speed for the selected data
u10_selected = selected_data['u10']
v10_selected = selected_data['v10']

# Calculate the wind speed using the function uv2idX
wind_speed_values, wind_dir_values = uv2idX(v10_selected.values, u10_selected.values)

# print the wind speed values:
print(wind_speed_values)
print(wind_dir_values)

# Supposing the 'wind_speed_values' and 'wind_dir_values' represent the results of uv2idX
# Supossing the 'latitudes'and 'longitudes' belong to the GeoDataFrame of wind speed

# Reducing the wind speed arrays to match the arrays of OWI
wind_speed_values_resized = wind_speed_values[:len(latitudes)]
wind_dir_values_resized = wind_dir_values[:len(longitudes)]

# Create a DataFrame with the coordinates and wind velocity for Wind Speed
data_wind_speed = {
    'Latitude': latitudes,
    'Longitude': longitudes,
    'Wind_Speed': wind_speed_values_resized,
    'Wind_Direction': wind_dir_values_resized
}
df_wind_speed = pd.DataFrame(data_wind_speed)

# Assuming 'owiSpeed', 'owiDir', 'lat' and 'lon' are OWI data
# Reduce the size of the OWI arrays to match the arrays of Wind Speed
owi_speed_resized = owiSpeed[:len(lat)]
owi_dir_resized = owiDir[:len(lon)]

# Create a Dataframe with wind speed and coordinates from OWI
data_owi = {
    'Latitude': lat.flatten(),
    'Longitude': lon.flatten(),
    'OWI_Speed': owi_speed_resized.flatten(),
    'OWI_Direction': owi_dir_resized.flatten()
}
df_owi = pd.DataFrame(data_owi)

# Convert DataFrames to GeoDataFrames
gdf_wind_speed = gpd.GeoDataFrame(df_wind_speed, geometry=gpd.points_from_xy(df_wind_speed['Longitude'], df_wind_speed['Latitude']))
gdf_owi = gpd.GeoDataFrame(df_owi, geometry=gpd.points_from_xy(df_owi['Longitude'], df_owi['Latitude']))

# Add a column of id to distinguish both datasets
gdf_wind_speed['Source'] = 'Wind_Speed'
gdf_owi['Source'] = 'OWI'

# Concatenate the GeoDataFrames
combined_gdf = pd.concat([gdf_wind_speed, gdf_owi])

from shapely.geometry import Point

# Assuming df_wind_speed_unique contains unique Wind_Speed data and gdf_owi_filtered contains OWI_Speed data

# Drop duplicates from Wind_Speed DataFrame based on coordinates
df_wind_speed_unique = df_wind_speed.drop_duplicates(subset=['Latitude', 'Longitude']).reset_index(drop=True)

# Empty DataFrame to store matched points
matched_points = pd.DataFrame()

# Loop through unique coordinates in Wind_Speed and find the nearest point in OWI_Speed
for index, row in df_wind_speed_unique.iterrows():
    lat = row['Latitude']
    lon = row['Longitude']
    # Find the closest point in gdf_owi_filtered
    closest_owi_point_idx = gdf_owi_filtered.distance(Point(lon, lat)).idxmin()
    closest_owi_point = gdf_owi_filtered.loc[closest_owi_point_idx]
    matched_points = matched_points.append(closest_owi_point)

# Calculate correlation between matched Wind_Speed and OWI_Speed values
correlation = np.corrcoef(df_wind_speed_unique['Wind_Speed'], matched_points['OWI_Speed'])[0, 1]

print(f"Correlation Coefficient between Wind_Speed and OWI_Speed: {correlation}")

4 5 / 5

ghajduch · December 19, 2023, 12:07pm

Hi,

Are you sure that you are reducing the size of your arrays as expected that way ?

Reducing the wind speed arrays to match the arrays of OWI

wind_speed_values_resized = wind_speed_values[:len(latitudes)]
wind_dir_values_resized = wind_dir_values[:len(longitudes)]

Here you are selecting the first len(latitudes) of wind_speed_values and first len(longitude) of wind_dir_values.

Either both longitudes and latitudes have the same size.
Or they don’t (if they correspond only to uniques longitude and latitude values for instance).

You should probably display the ERA wind and the OWI wind you compare first to check that you select the proper AOI on both.

cpeureux · December 20, 2023, 1:52pm

Hello,

How to improve the method of analysis, i. e. the correlation coefficient :

the typical correlation coefficient that is observed in Sentinel-1 CAL/VAL is around 0.9 (see illustration below, source : official S1 Mission Performance Center-MPC CAL/VAL), when computed over colocated hourly ECMWF 0.1° reanalysis wind speed and 0.1° averaged Sentinel-1 wind speed (from L2 data)

scatter800×600 147 KB
it seems that two elements could limit the comparison performances in your case:
- a space/time separation between Sentinel-1 and reference data, i.e. ERA5 and metop. On the one hand, when talking about spatial separation, probably colocating S1 and reference data instead of looking for the closest point would improve the comparison performances. On the other hand, on the time separation, again reducing the time separation between colocated measurements would benefit to your analysis. To our knowledge, ERA5 exhibits a hourly time resolution, which would lead to a maximum 30 minutes time separation with Sentinel-1 measurements. The comparisons between ERA5 and S1 data would be favorable. However, to our knowledge (internal MPC documents), colocations with metop are less favorable, and occur at mid latitudes typically within 3h at least of a Sentinel-1 measurement. It is then hard to tell if you can improve your comparisons between S1 and metop.
- a comparable pixel size : once colocated, S1 and reference wind vector measurements should be compared at the same resolution. S1 data resolution is high (1 km) while reference data resolution is coarse (>10 km). We recommend that you average S1 wind vector data at the coarser resolution of reference data, instead of interpolating reference data at 1 km resolution.

We hope these explanations are clear enough and will help you improving your analysis method. In any case, we are still available to answer your questions.