Sensor Occupancy Detection Using XG Boost Algorithm

A room in a smart home is fixed with environmental sensors for sensing of the indoor air quality. Environmental sensors can be any sensor from simple air temperature sensor to an indoor air quality measurement system, which holds different types of sensors or a networked sensor. Purpose of these sensors is to determine the indoor air quality and their potential in incorporating occupancy detection is largely unused. Occupancy detection is a technique used to detect the presence of living and non-living things. There are many environmental sensors which are used to detect different kinds of gases, namely CO2 (carbon dioxide) and TVOC (total volatile organic compounds) sensors which are used here to detect the gases that resides in a room. By detecting the indoor gases we can improve the quality of the air. CO2 sensor is used for detection of carbon dioxide composition, where as TVOC is internally built with CO2 sensor and it will detect other gases too. There are many machine learning algorithms that are used to classify the occupancy detection. In previous studies, naive Bayes classifier is used for detecting occupants using Weka tool. In this project we have made an attempt to use XG Boost Algorithm to detect occupants.

Existing System to detect Occupancy

Naïve Bayes classifiers are one of the machine learning algorithms where it is a simple probabilistic classifier. It works on Bayes theorem which takes independent assumptions above all the predictors. This model can be built easily and not to complicate because it works on the estimation iterating this helps for large datasets among all the algorithms. 

P(A/B) *P(A) =P(B/A)*P(B) P(A/B),

Posterior probability P(B/A) – is likelihood which is the probability of predictor given class. P(B) is predictor prior probability P(A) class prior probability In Naïve Bayes classifier there are two models. In Zero model there is no predictor. In OneR model we are trying to find single best predictor [1].

Issues with existing system: 

  • It completely based on assumption of independent variable 
  • Zero frequency problem

Our Contribution:

We have used XGBoost algorithm, which is also known as ensemble learning method. This algorithm is being implemented from python package SKLearn. An example is shown in Figure 1 and in Figure 2 process flow diagram displayed below:

Figure 1: XBoost classification example

Figure 2: XBoost classification flow process

Data Set Information:

From [2], we collected the sensor occupancy data where it has total 205354 instances with 32 attributes. In those 32 attributes we have missing values for 10 attributes we eliminate those values. Attributes are named below; outdoor values are completely ignored here.

Table 1: Attribute Information

s.noAttribute name, units (if any), type
1wkd {0, 1}, binary
2Time, seconds, numeric
3co2,ppm, numeric
4co2_sma ,ppm,numeric
5co2_var ,ppm,numeric
6co2_sma_fd,ppm, numeric
7co2_sma_sd,ppm, numeric
8voc, ppm, numeric
9voc _sma,ppm, numeric
10voc_var ,ppm, numeric
11voc_sma_fd ,ppm, numeric
12voc_sma_sd,ppm, numeric
13t1, Celsius, numeric
14t1_sma, Celsius, numeric
15t1_var,Celsius, numeric
16t1_sma_fd,Celsius, numeric
17t1_sma_sd, Celsius, numeric
18rh1 ,%,numeric
19rh1_sma,%, numeric
20rh1_var,%, numeric
21rh1_sma_fd,%, numeric
22rh1_sma_sd,%, numeric
23t12,Celsius, numeric
24t12_sma ,Celsius, numeric
25t12_var,Celsius, numeric
26t12_sma_fd,Celsius, numeric
27t12_sma_sd,Celsius, numeric
28rh12 ,%,numeric

Table 2: Data Set Information



We applied XGBoost and Naïve Bayes algorithm and the results are as follows: 

AlgorithmSplit Ratio (train-test)Accuracy
Naïve Bayes64-3681.1%

From total data 205354 we split into 64% train and 36% test and we have achieved the accuracy of 82.62% and we have improved the accuracy from the existing system where Naïve Bayes has achieved 81.1%


 [1]. S.sayad,“Naive Bayesian”, [Online] Available:

[2] [Online] Available:


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *