Hello, i’ve been trying to calculate the linear correlation between the ERA5(28 km resolution) global wind speeds,and metop level 2 wind speed data (12.5km resolution), with the ocean wind field product from OCN level2 sentinel 1 dataset(1 km resolution).First i interpolate the two products and then select the closest points to the owi product to calculate the correlation.Im not altering the spatial resolution to match both products, simply selecting the closest point and calculating the correlation trough python.The results so far were 0.46, 0.32 and -0.7.It seems to be going very poorly for me, could someone give me an indication of how i could improve my method of analysis?
This is the code i’ve been using:
import geopandas as gpd
import pandas as pd
import numpy as np
#calculating the wind speed using the u and v component from the ERA5 file
def uv2idX(v, u):
i = np.sqrt(u**2 + v**2) # Calculate the intensity
di = np.arctan2(v, u) * 180 / np.pi # Calculate the direction in degrees
di = (di + 360) % 360 #Ajust to the interval fom 0 to 360 graus
d = (90 - di) % 360 # Ajust to the interval fom 0 to 360 graus
return i, d
# Transform xarray Dataset in pandas DataFrame
ds_df = ds.to_dataframe()
# Reset the index of the DataFrame and adjust the longitude for an interval between -180 and 180
ds_df = ds_df.reset_index()
ds_df['longitude'] = (ds_df['longitude'] + 180) % 360 - 180
# Create a GeoDataFrame for geometry based on 'longitude' and 'latitude'
gdf = gpd.GeoDataFrame(ds_df, geometry=gpd.points_from_xy(ds_df['longitude'], ds_df['latitude']))
# Defining limits for data selection
lon_min, lon_max, lat_min, lat_max = -70, -10, -40, 10
# Selecting the data within those limits
selected_data = gdf[
((gdf['latitude'] >= lat_min) & (gdf['latitude'] <= lat_max)) &
((gdf['longitude'] >= lon_min) & (gdf['longitude'] <= lon_max))
]
# Calculate the wind speed for the selected data
u10_selected = selected_data['u10']
v10_selected = selected_data['v10']
# Calculate the wind speed using the function uv2idX
wind_speed_values, wind_dir_values = uv2idX(v10_selected.values, u10_selected.values)
# print the wind speed values:
print(wind_speed_values)
print(wind_dir_values)
# Supposing the 'wind_speed_values' and 'wind_dir_values' represent the results of uv2idX
# Supossing the 'latitudes'and 'longitudes' belong to the GeoDataFrame of wind speed
# Reducing the wind speed arrays to match the arrays of OWI
wind_speed_values_resized = wind_speed_values[:len(latitudes)]
wind_dir_values_resized = wind_dir_values[:len(longitudes)]
# Create a DataFrame with the coordinates and wind velocity for Wind Speed
data_wind_speed = {
'Latitude': latitudes,
'Longitude': longitudes,
'Wind_Speed': wind_speed_values_resized,
'Wind_Direction': wind_dir_values_resized
}
df_wind_speed = pd.DataFrame(data_wind_speed)
# Assuming 'owiSpeed', 'owiDir', 'lat' and 'lon' are OWI data
# Reduce the size of the OWI arrays to match the arrays of Wind Speed
owi_speed_resized = owiSpeed[:len(lat)]
owi_dir_resized = owiDir[:len(lon)]
# Create a Dataframe with wind speed and coordinates from OWI
data_owi = {
'Latitude': lat.flatten(),
'Longitude': lon.flatten(),
'OWI_Speed': owi_speed_resized.flatten(),
'OWI_Direction': owi_dir_resized.flatten()
}
df_owi = pd.DataFrame(data_owi)
# Convert DataFrames to GeoDataFrames
gdf_wind_speed = gpd.GeoDataFrame(df_wind_speed, geometry=gpd.points_from_xy(df_wind_speed['Longitude'], df_wind_speed['Latitude']))
gdf_owi = gpd.GeoDataFrame(df_owi, geometry=gpd.points_from_xy(df_owi['Longitude'], df_owi['Latitude']))
# Add a column of id to distinguish both datasets
gdf_wind_speed['Source'] = 'Wind_Speed'
gdf_owi['Source'] = 'OWI'
# Concatenate the GeoDataFrames
combined_gdf = pd.concat([gdf_wind_speed, gdf_owi])
from shapely.geometry import Point
# Assuming df_wind_speed_unique contains unique Wind_Speed data and gdf_owi_filtered contains OWI_Speed data
# Drop duplicates from Wind_Speed DataFrame based on coordinates
df_wind_speed_unique = df_wind_speed.drop_duplicates(subset=['Latitude', 'Longitude']).reset_index(drop=True)
# Empty DataFrame to store matched points
matched_points = pd.DataFrame()
# Loop through unique coordinates in Wind_Speed and find the nearest point in OWI_Speed
for index, row in df_wind_speed_unique.iterrows():
lat = row['Latitude']
lon = row['Longitude']
# Find the closest point in gdf_owi_filtered
closest_owi_point_idx = gdf_owi_filtered.distance(Point(lon, lat)).idxmin()
closest_owi_point = gdf_owi_filtered.loc[closest_owi_point_idx]
matched_points = matched_points.append(closest_owi_point)
# Calculate correlation between matched Wind_Speed and OWI_Speed values
correlation = np.corrcoef(df_wind_speed_unique['Wind_Speed'], matched_points['OWI_Speed'])[0, 1]
print(f"Correlation Coefficient between Wind_Speed and OWI_Speed: {correlation}")
4 5 / 5