A room in a smart home is fixed with environmental sensors for sensing of the indoor air quality. Environmental sensors can be any sensor from simple air temperature sensor to an indoor air quality measurement system, which holds different types of sensors or a networked sensor. Purpose of these sensors is to determine the indoor air quality and their potential in incorporating occupancy detection is largely unused. Occupancy detection is a technique used to detect the presence of living and non-living things. There are many environmental sensors which are used to detect different kinds of gases, namely CO2 (carbon dioxide) and TVOC (total volatile organic compounds) sensors which are used here to detect the gases that resides in a room. By detecting the indoor gases we can improve the quality of the air. CO2 sensor is used for detection of carbon dioxide composition, where as TVOC is internally built with CO2 sensor and it will detect other gases too. There are many machine learning algorithms that are used to classify the occupancy detection. In previous studies, naive Bayes classifier is used for detecting occupants using Weka tool. In this project we have made an attempt to use XG Boost Algorithm to detect occupants.

**Existing System to detect Occupancy**

Naïve Bayes classifiers are one of the machine learning algorithms where it is a simple probabilistic classifier. It works on Bayes theorem which takes independent assumptions above all the predictors. This model can be built easily and not to complicate because it works on the estimation iterating this helps for large datasets among all the algorithms.

P(A/B) *P(A) =P(B/A)*P(B) P(A/B),

Posterior probability P(B/A) – is likelihood which is the probability of predictor given class. P(B) is predictor prior probability P(A) class prior probability In Naïve Bayes classifier there are two models. In Zero model there is no predictor. In OneR model we are trying to find single best predictor [1].

Issues with existing system:

- It completely based on assumption of independent variable
- Zero frequency problem

**Our Contribution**:

We have used XGBoost algorithm, which is also known as ensemble learning method. This algorithm is being implemented from python package SKLearn. An example is shown in Figure 1 and in Figure 2 process flow diagram displayed below:

**Figure 1: XBoost classification example**

**Figure 2: XBoost classification flow process**

**Data Set Information:**

From [2], we collected the sensor occupancy data where it has total 205354 instances with 32 attributes. In those 32 attributes we have missing values for 10 attributes we eliminate those values. Attributes are named below; outdoor values are completely ignored here.

**Table 1: Attribute Information**

s.no | Attribute name, units (if any), type |

1 | wkd {0, 1}, binary |

2 | Time, seconds, numeric |

3 | co2,ppm, numeric |

4 | co2_sma ,ppm,numeric |

5 | co2_var ,ppm,numeric |

6 | co2_sma_fd,ppm, numeric |

7 | co2_sma_sd,ppm, numeric |

8 | voc, ppm, numeric |

9 | voc _sma,ppm, numeric |

10 | voc_var ,ppm, numeric |

11 | voc_sma_fd ,ppm, numeric |

12 | voc_sma_sd,ppm, numeric |

13 | t1, Celsius, numeric |

14 | t1_sma, Celsius, numeric |

15 | t1_var,Celsius, numeric |

16 | t1_sma_fd,Celsius, numeric |

17 | t1_sma_sd, Celsius, numeric |

18 | rh1 ,%,numeric |

19 | rh1_sma,%, numeric |

20 | rh1_var,%, numeric |

21 | rh1_sma_fd,%, numeric |

22 | rh1_sma_sd,%, numeric |

23 | t12,Celsius, numeric |

24 | t12_sma ,Celsius, numeric |

25 | t12_var,Celsius, numeric |

26 | t12_sma_fd,Celsius, numeric |

27 | t12_sma_sd,Celsius, numeric |

28 | rh12 ,%,numeric |

**Table 2: Data Set Information**

Dataset/Participant | Vacant | Occupied | Total |

T1/A | 15503 | 30281 | 45784 |

T2/A | 11490 | 17342 | 28832 |

T3/B | 9910 | 10018 | 19928 |

T4/B | 20495 | 17033 | 37528 |

T5/C | 2946 | 16998 | 19944 |

T6/C | 14118 | 5566 | 19684 |

T7/D | 5359 | 12275 | 17634 |

T8/D | 4479 | 11541 | 16020 |

T1-8/A-D | 84300 | 121054 | 205354 |

**Results:**

We applied XGBoost and Naïve Bayes algorithm and the results are as follows:

Algorithm | Split Ratio (train-test) | Accuracy |

XGBoost | 64-36 | 82.62% |

Naïve Bayes | 64-36 | 81.1% |

From total data 205354 we split into 64% train and 36% test and we have achieved the accuracy of 82.62% and we have improved the accuracy from the existing system where Naïve Bayes has achieved 81.1%

**References:**

[1]. S.sayad,“Naive Bayesian”, Saedsayad.com. [Online] Available: https://www.saedsayad.com/naive_bayesian.htm

[2] [Online] Available: http://bit.ly/occupancy_data