I'm using scikit-learn's
train_test_split functionality and am getting different results when running the same code repeatedly:
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)
When I log the number of unique elements in
I get different values on repeated runs (with no code changes). I would have thought the
random_state would ensure a deterministic split.
How can I ensure the same split each time?
The value you set the
random_state (42 used in many scikit-learn examples) does not really matter, what is most important is that the value is the same always so you can validate your code multiple times.
There might be some other randomness present in your code that produces different result could you post your complete code.