Using sklearn's decision Tree Model to find Real data in python

the source of the problem: now both in the book and online are to divide a bunch of data sets with known prediction results into test sets and training sets, and then look at the accuracy between the prediction results and the real value, and report what. Ask weakly, I have a dataset, and now I want to predict the results of the data that have no results. The question comes: how to operate the data to be predicted and how to carry out feature engineering together with the training set and the test set? That is to say, how to do feature engineering with data sets without prediction results and those used for training. Then the predicted results are solved.

"""2 """
-sharp
x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.01)


-sharp
dict= DictVectorizer(sparse=False)
-sharp  
x_train = dict.fit_transform(x_train.to_dict(orient="records"))
print(dict.get_feature_names())
x_test = dict.transform(x_test.to_dict(orient="records"))
print(x_train)

"""3"""
-sharp,
dec = DecisionTreeClassifier(max_depth=12,min_samples_leaf=1)
-sharp 
dec.fit(x_train,y_train)


-sharp  
y_predict = dec.predict(x_test)-sharp-sharp-sharp 




what does it mean to predict the results of data without results? You always need training data to train your model, and this training model is both X and Y

.
Menu