Scikit-learn: train/test split not reproducible

869 views python
2

I'm using scikit-learn's train_test_split functionality and am getting different results when running the same code repeatedly:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)

When I log the number of unique elements in y_train:

logger.info(len(set(y_train)))

I get different values on repeated runs (with no code changes). I would have thought the random_state would ensure a deterministic split.

How can I ensure the same split each time?

answered question

1 Answer

10

The value you set the random_state (42 used in many scikit-learn examples) does not really matter, what is most important is that the value is the same always so you can validate your code multiple times.

There might be some other randomness present in your code that produces different result could you post your complete code.

posted this

Have an answer?

JD

Please login first before posting an answer.