0%

Python提示ValueError could not convert string to float

❓ 我有一个带有日期列的Pandas DataFrame。 数据是从csv文件导入的。 当我尝试拟合回归模型时,出现错误ValueError: could not convert string to float: '2019-08-30 07:51:21'。我该如何摆脱呢?

这是数据:

1
2
3
4
5
6
    event_id    tsm_id  rssi_ts        rssi batl    batl_ts    ts_diff
0 417736018 4317714 2019-09-05 20:00:07 140 100.0 2019-09-05 18:11:49 01:48:18
1 417735986 4317714 2019-09-05 20:00:07 132 100.0 2019-09-05 18:11:49 01:48:18
2 418039386 4317714 2019-09-06 01:00:08 142 100.0 2019-09-06 00:11:50 00:48:18
3 418039385 4317714 2019-09-06 01:00:08 122 100.0 2019-09-06 00:11:50 00:48:18
4 420388010 4317714 2019-09-07 15:31:07 143 100.0 2019-09-07 12:11:50 03:19:17

这是我的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
model = pd.read_csv("source.csv")
model.describe()

event_id tsm_id. rssi batl
count 5.000000e+03 5.000000e+03 5000.000000 3784.000000
mean 3.982413e+08 4.313492e+06 168.417200 94.364429
std 2.200899e+07 2.143570e+03 35.319516 13.609917
min 3.443084e+08 4.310312e+06 0.000000 16.000000
25% 3.852882e+08 4.310315e+06 144.000000 97.000000
50% 4.007999e+08 4.314806e+06 170.000000 100.000000
75% 4.171803e+08 4.314815e+06 195.000000 100.000000
max 4.258451e+08 4.317714e+06 242.000000 100.000000

labels_b = np.array(model['batl'])
features_r= model.drop('batl', axis = 1)
features_r = np.array(features_r)

from sklearn.model_selection import train_test_split
train_features, test_features, train_labels, test_labels = train_test_split(features_r,
labels_b, test_size = 0.25, random_state = 42)

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
rf.fit(train_features, train_labels);

这是错误信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ValueError                                Traceback (most recent call last)
-28-bc774a9d8239> in
4 rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
5 # Train the model on training data
----> 6 rf.fit(train_features, train_labels);

~/ml/env/lib/python3.7/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
247
248 # Validate or convert input data
--> 249 X = check_array(X, accept_sparse="csc", dtype=DTYPE)
250 y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
251 if sample_weight is not None:

~/ml/env/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
494 try:
495 warnings.simplefilter('error', ComplexWarning)
--> 496 array = np.asarray(array, dtype=dtype, order=order)
497 except ComplexWarning:
498 raise ValueError("Complex data not supported\n"

~/ml/env/lib/python3.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
536
537 """ """
--> 538 return array(a, dtype, copy=False, order=order)
539
540

ValueError: could not convert string to float: '2019-08-30 07:51:21'

✔️ 您必须将时间日期从字符串转换为pandas时间戳。 这可以通过以下行完成(在您编写时,所有其他内容均会保留):

1
2
3
4
5
6
7
8
model = (
pd.read_csv("source.csv", parse_dates=['rssi_ts', 'batl_ts'], date_parser=lambda x: pd.to_datetime(x))
.assign(
rssi_ts=lambda x: x.loc[:, 'rssi_ts'].astype(int) / 10 ** 9,
batl_ts=lambda x: x.loc[:, 'batl_ts'].astype(int) / 10 ** 9,
ts_diff=lambda x: pd.to_timedelta(x.loc[:, 'ts_diff']).astype(int) / 10 ** 9
)
)

parse_dates参数创建的时间戳对象可以转换为float。