Industrial manufacturing
Industrial Internet of Things | Industrial materials | Equipment Maintenance and Repair | Industrial programming |
home  MfgRobots >> Industrial manufacturing >  >> Manufacturing Technology >> Manufacturing process

AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors

Components and supplies

AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
SparkFun Arduino Pro Mini 328 - 5V/16MHz
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Ultrasonic Sensor - HC-SR04 (Generic)
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
SparkFun Dual H-Bridge motor drivers L298
×1
DC motor (generic)
60 RPM DC MOTORS
×2
wheels
×4
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Jumper wires (generic)
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
SparkFun Solder-able Breadboard - Mini
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Rechargeable Battery, Lithium Ion
7.2-volt Lithium-Ion Battery
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
9V battery (generic)
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
9V Battery Clip
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
PCBWay Custom PCB
×1
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
C&K Switches JS Series Switch
×1

Necessary tools and machines

AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Soldering iron (generic)
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Solder Wire, Lead Free
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Solder Flux, Soldering
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Desoldering Pump, Deluxe SOLDAPULLT®
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Wire Stripper & Cutter, 18-10 AWG / 0.75-4mm² Capacity Wires
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Hot glue gun (generic)
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Tape, Double Sided

Apps and online services

AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
Arduino IDE

About this project

OBSTACLE AVOIDING ROBOT USING AI/REINFORCEMENT LEARNING

Problem Statement: The main objective is to learn to avoid obstacles in “N” Episodes and learn the optimal action. In this case, let's assume we need our Robot to learn optimal action as 'Right'.

Reinforcement Algorithm Used: Q learning

How L298N Drives Two DC Motors:

AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors

How HC-SR04 Sensor calculates distance:

AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors

Important Terms in Reinforcement Learning:

1. STATE: This is the situation in which the Robot is. Here for a basic Obstacle Avoiding Robot, there are in total 2 states ……1st state is when there is no obstacle close to it and 2ndstate in which there is an obstacle in front of it.(when I wrote the code I assumed 10 different states can be in which expected the same action. The reason I did this to illustrate a more complex environment.)

2. ACTION: In a particular State the robot performs a particular action. There are 4 actions which the robot can perform in 2nd state: “FORWARD”, “BACKWARD”, “LEFT”, “STOP”. In 1st state, the robot can perform 4 actions but to make things easier I have assumed that robot can perform only one action which is “FORWARD”( This is because its lame to consider actions such as LEFT or BACKWARD when there are no obstacles nearby.

int ACTIONS = [0,1,2,3]

/* HERE :
0 = FORWARD
1 = BACKWARD
2 = STOP
3 = RIGHT*/

3. NEXT STATE: This is the state robot gets in when it performs a particular “ACTION” in its current “STATE”. In obstacle avoiding the robot case, the NEXT STATE can be either a “CRASHED” state or a “SURVIVED” State. (Here SURVIVE state is same as the starting state that robot is in when its episode starts.)

/*AFTER PERFOMING AN ACTION THE ROBOT GOES INTO NEXT STATE IN THIS CASE OF OBSTACLE 
AVOIDING ROBOT*/

int NEXT_STATE;
int STATE = 0;
NEXT_STATE = STATE+1;

4. Q TABLE / Q MATRIX: This table is formed by Number of “STATES” and Number of “ACTIONS”. In Obstacle avoidance Robot’s Case, this Table is given by:

float Q[N_STATES][NUMBER_OF_ACTIONS] = {{0.0,0.0,0.0,0.0}, 
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0},
{0.0,0.0,0.0,0.0}};

Here N_STATES = 10 AND N_ACTIONS = 4. here "0.0" indicates that any action can be performed from any of the 4 possible actions. if you, however, want to eliminate a particulate action in a state just replace "0.0" with "-1.0" in the matrix. "-1.0" indicates that the action cannot be performed in that state. here it is assumed that we have 10 different states with each state expecting the same action. if you want your robot to learn actions that are different in each state then change the rewards from the reward matrix in the code

5. TERMINAL STATE: This is the last state in which the robot can be in. For obstacle Avoiding Robot, this state doesn’t exist as we don’t have any terminal state and want to keep our Robot learning forever.

6. REWARD MATRIX: This table or matrix is used to give rewards to the robot for certain actions. The reward is positive or negative depending upon the quality of the action.

int REWARDS[STATES][NUMBER_OF_ACTIONS] = {{-10,-2,-1,10}, 
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10},
{-10,-2,-1,10}};

7. ENVIRONMENT: This can also be assumed or considered as the world for the Robot. For example, we humans live on earth so basically earth is our environment.

Hyperparameters in Reinforcement Learning:

1. LEARNING RATE (ALPHA): The learning rate or step size determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities). In fully deterministic environments, a learning rate of ALPHA = 1.0 is optimal. When the problem is stochastic, the algorithm converges under some technical conditions on the learning rate that require it to decrease to zero. In practice, often a constant learning rate is used, such as ALPHA = 0.1 for all scenarios.

float ALPHA = 0.2;

2. DISCOUNT FACTOR (GAMMA): The discount factor of 0 determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the action values may diverge. For GAMMA = 1.0, without a terminal state, or if the agent never reaches one, all environment histories become infinitely long, and utilities with additive, undiscounted rewards generally become infinite. Even with a discount factor only slightly lower than 1, Q function learning leads to the propagation of errors and instabilities when the value function is approximated with an artificial neural network. In that case, starting with a lower discount factor and increasing it towards its final value accelerates learning.

float GAMMA = 0.9;

3. EXPLORATION RATE (EPSILON): This parameter decides to what extent the robot should explore the environment. Exploring the environment means performing random actions and analyzing the results through Q Values. Usually, in Q Learning (unlike SARSA) we get rid of this parameter eventually as Robot goes on learning more and more. But In this project, we are not going to rid of Epsilon as we don’t have any terminal state. Epsilon in this case will reduce to some extent and then again get reset when it goes below a threshold value. This will make sure that the robot keeps on exploring till its lifetime just like we humans do.

float EPSILON = 0.75;

Q-LEARNING ALGORITHM:

  • Initialize the Q-values table, Q(s, a). I have initialized these values to 0.0.
  • Observe the current state, s.
  • Choose an action, a, for that state based on one of the action selection policies explained here on the previous page (📷-soft, 📷-greedy or softmax).
//////////////////////////Epsilon Greedy Policy//////////////////////////////

PROB = RANDOM(EPSILON);
if (PROB<=EPSILON) //EXPLORE THE ACTIONS
{
ACTION = random(0,4);
FLAG = 2;
}
else //EXPLOIT THE ACTIONS FROM Q TABLE
{
ACTION = ARGMAX(Q,STATE);
FLAG = 2;
}
  • Take the action, and observe the reward, r, as well as the new state, s'.
  • Update the Q-value for the state using the observed reward and the maximum reward possible for the next state. The updating is done according to the formula and parameters described above.
  • Set the state to the new state, and repeat the process until a terminal state is reached.
  • To understand Q-learning better visit this link:https://towardsdatascience.com/a-beginners-guide-to-q-learning-c3e2a30a653c
AI-Powered Obstacle Avoidance System with Arduino and Ultrasonic Sensors
///////////////////Implementation of Q_Learning Formula/////////////////////////

Q_OLD = Q_TABLE[S][A];
Q_MAX = MAX(Q_TABLE, NEXT_S);
Q_NEW = (1-LEARNING_RATE)*Q_OLD + LEARNING_RATE*(R + DISCOUNT_FACTOR*Q_MAX);
Serial.print("Q VALUE : ");
Serial.println(Q_NEW);
Q_TABLE[S][A] = Q_NEW;

Working Video: Don't forget to check the working video of AI Robot :)


Code

  • Obstacle Avoidance Robot using Q-LEARNING
Obstacle Avoidance Robot using Q-LEARNINGC/C++
THIS IS AN ARTIFICIAL INTELLIGENCE BASED OBSTACLE AVOIDANCE ROBOT'S MAIN CODE.
/* This is an Obstacle Avoiding Robot using Reinforcement Learning/AI
   Author of this Project : Varun Walimbe
   Algorithm used in this project: Q learning

How Obstacle Avoiding Works?
1.Ultrasonic sensor is used measure distance from the obstacle using
  its Echo and Trig Pins.
2.When distance is measured and if its less than 20cm then there is
  an obstacle nearby otherwise robot is safe and continue Forward.
3.If obstacle is detected  then robot takes left or right turn depending
 on the situation.

How AI based Obstacle Avoidance Works?(Q learning)
1.Here the 1st step from upper article remains the same.However the
  2nd Step is different.
2.A list of actions of the robot are initialised first. For Example
 in this case actions of Robot are: Left , Forward, Backward ,Stop.
3.When the Robot comes near an obstacle it is required to perform an action.
  However note that in this case Robot doesn't know which action to take as its not pre
  programmed and going to learn on its own to avoid obstacles.
4.When Robot stops when there is an obstacle in front of it then it gets reward as 0
  When Robot stops and goes backward  it receives reward of -5
  When Robot continues to move forward ignoring the obstacles it receives reward of -10
  When Robot just moves left as soon as obstacle is detected it gets reward of +10
5.In this way Robot learns on its own to avoid obstacles by Reward Mechanism.*/

//////////ROBOT'S HARDWARE PARAMETERS////////////////////
int TRIG_PIN = 7;
int ECHO_PIN = 8;
int duration;
float distance;

int M1 = 13;
int M2 = 12;
int M3 = 11;
int M4 = 10;

bool Obstacle = false;
int FLAG;
/////////////////////////END/////////////////////////////

/////////////////////////////////////Q LEARNING PARAMETERS///////////////////////////////////////////
float ALPHA = 0.1;    //LEARNING RATE
float GAMMA = 0.5;    //DISCOUNT FACTOR
float EPSILON = 0.90; //EXPLORATION PARAMETER
int REWARD;           //REWARD FOR PERFORMING AN ACTION
int EPISODES  = 100;

int STATE;                        // CURRENT STATE OF THE ROBOT
int ACTION = 0;                   //ACTION PERFORMED BY THE ROBOT(0:FORWARD,1:BACKWARD ,2;STOP,3:LEFT)
float PROB;                       //USED FOR EPSILON DECAY 
bool ACTION_TAKEN = false;        //THIS VARIABLES TELLS US WHETHER AN ACTION IS TAKEN OR NOT
int NEXT_STATE;                   // NEXT STATE OF THE ROBOT
const int STATES = 10;            //NUMBER OF STATES IN ENVIRONMENT 
int ACTIONS[4] = {1,2,3,4};
const int NUMBER_OF_ACTIONS = 4; //TOTAL WE HAVE 4 ACTION FORWARD,BACKWARD,LEFT AND STOP


/*THIS IS THE Q MATRIX OR Q TABLE. THIS IS BASICALLY THE DIARY THAT ROBOT WILL LOOK INTO 
BEFORE PERFORMING AN ACTION.BASED ON THE ACTION THE ROBOT WILL EARN REWARD AND THE Q VALUE 
WILL BE UPDATED IN THIS Q TABLE. HERE I HAVE CONISDERED 10 STATES. I HAVE ASSUMED ALL STATES 
ARE DIFFERENT EVEN THOUGH THEY ARE SAME.BASICALLY OBSTACLE AVOIDING ROBOT CONTAINS ONLY TWO STATES
i.e: 
1:WHEN ITS AWAY FROM OBSTACLE 
2:WHEN ITS NEAR TO THE OBSTACLE
BUT HERE TO ILLUSTRATE MORE COMPLEX ENVIRONMENT I HAVE ASSUMED THERE ARE 10 DIFFERENT STATES HERE
EXPECTING SAME/DIFFERENT ACTION.*/

float Q[STATES][NUMBER_OF_ACTIONS] = {{0.0,0.0,0.0,0.0},  //MOST IMPORTANT OF ALL IS THE Q TABLE.
                                      {0.0,0.0,0.0,0.0},  //IT IS FORMED BY STATES AS ITS  ROWS 
                                      {0.0,0.0,0.0,0.0},  //AND COLLUMNS AS ITS NUMBER OF ACTIONS
                                      {0.0,0.0,0.0,0.0},  //INITIALISED TO ZERO IN THE START
                                      {0.0,0.0,0.0,0.0},  // THIS WILL UPDATED IN THE FUTURE.
                                      {0.0,0.0,0.0,0.0},
                                      {0.0,0.0,0.0,0.0},
                                      {0.0,0.0,0.0,0.0},
                                      {0.0,0.0,0.0,0.0},
                                      {0.0,0.0,0.0,0.0}};

/*THIS IS A REWARD MATRIX OR REWARD TABLE. THIS IS RESPONSIBLE FOR GIVING
REWARD TO ROBOT FOR PERFORMING PARTICULAR ACTION. IT STORES THE REWARD FOR 
EACH ACTION TAKEN AT STATE. THE REWARD WILL BE POSITIVE IF THE ACTION 
PERFORMED IS GOOD AND NEGATIVE IF ACTION YIELDS BAD RESULTS.*/

int REWARDS[STATES][NUMBER_OF_ACTIONS] = {{-10,-2,-1,10}, 
                                          {-10,-2,-1,10}, 
                                          {-10,-2,-1,10}, 
                                          {-10,-2,-1,10},  
                                          {-10,-2,-1,10},
                                          {-10,-2,-1,10},
                                          {-10,-2,-1,10},
                                          {-10,-2,-1,10},
                                          {-10,-2,-1,10},
                                          {-10,-2,-1,10}};            
////////////////////////////////////////////END///////////////////////////////////////////////////

////////////////Q LEARNING UPDATE PARAMETERS////////////
float Q_OLD;
float Q_NEW;
float Q_MAX;
//////////////////////////END//////////////////////////

void setup() 
{
  Serial.begin(9600);
  pinMode(TRIG_PIN,OUTPUT);
  pinMode(ECHO_PIN,INPUT);
  pinMode(M1,OUTPUT);
  pinMode(M2,OUTPUT);
  pinMode(M3,OUTPUT);
  pinMode(M4,OUTPUT);  
  randomSeed(analogRead(A0));
  STATE = 0;
  Serial.println("TRAINING WILL START IN 5 SECONDS:  ");
  delay(5000);
} 

////////////////////////////ROBOT'S FUNCTIONS/////////////////////////////////
void Forward()
{
  digitalWrite(M1,LOW);
  digitalWrite(M2,HIGH);
  digitalWrite(M3,LOW);
  digitalWrite(M4,HIGH); 
}

void Backward()
{
  digitalWrite(M1,HIGH);
  digitalWrite(M2,LOW);
  digitalWrite(M3,HIGH);
  digitalWrite(M4,LOW);
}

void Left()
{
  digitalWrite(M1,HIGH);
  digitalWrite(M2,LOW);
  digitalWrite(M3,LOW);
  digitalWrite(M4,HIGH);
}

void Right()
{
  digitalWrite(M1,LOW);
  digitalWrite(M2,HIGH);
  digitalWrite(M3,HIGH);
  digitalWrite(M4,LOW);
}

void Stop()
{
  digitalWrite(M1,LOW);
  digitalWrite(M2,LOW);
  digitalWrite(M3,LOW);
  digitalWrite(M4,LOW);
}

bool Obstacle_Avoider()
{
  digitalWrite(TRIG_PIN, HIGH);
  delayMicroseconds(10);
  digitalWrite(TRIG_PIN, LOW);

  duration = pulseIn(ECHO_PIN ,HIGH);
  distance = (duration/2)/29.1;
  
  if(distance<15)
  { 
    Obstacle = true;
  }

  if(distance>15)
  {
    Obstacle = false;
  }
  
  delay(10);
  return Obstacle;
}
////////////////////////////////////////////END////////////////////////////////////////////////

///////////////////////////////ROBOT'S Q LEARNING FUNCTIONS////////////////////////////////////

float RANDOM(float EXPLORATION_PARAMETER)
{
  /*THIS FUNCTION FINDS RANDOM NUMBER WHICH
  DECIDES WHETHER AN ACTION TO BE TAKEN IS RANDOM
  OR FROM Q_TABLE*/
  
  float RANDOM_VARIABLE;
  float PROBABILITY;

  RANDOM_VARIABLE = random(0,100);
  PROBABILITY = RANDOM_VARIABLE/100;

  return PROBABILITY;
}


float DECAY(float PARAMETER)
{
  /*THIS FUNCTION IS USED TO REDUCE 
  EPSILON(EXPLORATION PARAMETER) WITH
  TIME.FINALLY AT THE END YOU GET RID 
  EPSILON AND THE ROBOT LEARNS TO AVOID
  OBSTACLES ON ITS OWN */
 
  PARAMETER = PARAMETER*0.98; //PARAMETER HERE IS THE EPSILON
  return PARAMETER;
}

int GET_STATE()
{
  int STATE_NUMBER;
  STATE_NUMBER = random(0,10);
  return STATE_NUMBER;
}


float MAX(float Q_Table[][4],int NEXT_S)
{ 
  /*THIS FUNCTION FINDS THE BIGGEST NUMBER
  IN Q_TABLE[NEXT_STATE]. THE MAIN ROLE OF
  THIS FUNCTION IS TO FIND Q_MAX PARAMETER*/
  
  float LIST[4];
  float N1;
  float N2;
  float MAX_VALUE= 0.0;
  float DIFF; 

  for(int b = 0; b<=3; b++)
     {
      LIST[b] = Q[NEXT_S][b];
     }

  for(int j = 0; j<=2 ; j++)
    {
      if(MAX_VALUE >LIST[j])
        {
          N1 = MAX_VALUE;
        }
      else
        {
         N1 = LIST[j];
        }
         
      N2 = LIST[j+1];
      DIFF = N1- N2;

      if(DIFF > 0)
        {
          MAX_VALUE = N1;
        }

      else
        {
          MAX_VALUE = N2;
        }
    }   
  return MAX_VALUE;
}


int ARGMAX(float Q_Table[][4],int S)
{
  /*THIS FUNCTION FINDS THE INDEX OF 
  BIGGEST Q VALUE IN Q TABLE[STATE]*/
  
   float ARRAY[4];
   float N1;
   float N2;
   float MAX_VALUE = 0.0;
   float DIFF;
   float NUMBER;
   int MAX_INDEX;

   for(int u= 0; u<=3; u++)
     {
       ARRAY[u] = Q_Table[S][u];
     }
   
   for(int p = 0; p<=2 ; p++)
    {
      if(MAX_VALUE >ARRAY[p])
        {
          N1 = MAX_VALUE;
        }
      else
        {
         N1 = ARRAY[p];
        }
         
      N2 = ARRAY[p+1];
      DIFF = N1- N2;

      if(DIFF > 0)
        {
          MAX_VALUE = N1;
        }

      else
        {  
          MAX_VALUE = N2;
        }
    }

  for(int r = 0; r<=3;r++)
     {
      NUMBER = ARRAY[r];
      if(NUMBER == MAX_VALUE)
        {
          MAX_INDEX  = r;
          break;
        }
     }
  
  return MAX_INDEX;
}

void Update(float Q_TABLE[][4] , int S, int NEXT_S, int A, int ACTIONS[], int R, float LEARNING_RATE, float DISCOUNT_FACTOR)
{
  /*THIS FUNCTION UPDATES THE Q TABLE AND Q VALUES. THIS UPDATE KEEPS ON HAPPENING UNTILL THE 
  MAIN LOOP ENDS. AT THE END OF EPISODES THE Q TABLE IS FILLED WITH VARIOUS VALUES. THE GREATER
  THE VALUES THE GREATER IMPORTANCE THE ACTION HAS AT THAT PARTICULAR STATE. "Q_OLD" IS OLD VALUE
  THAT THE Q MATRIX HAS.THIS IS THE VALUE WHICH GETS UPDATED EVENTUALLY. Q_NEW IS THE NEW Q_VALUE 
  WHICH IS CALCULATED BY THE Q LEARNING FORMULA. THE Q LEARNING FORMULA USED HERE IS BASED ON 
  BELLMAN EQUATION USES TEMPORAL DIFFERENCE LEARNING APPROACH.(MONTE CARLO APPROACH WILL NOT
  WORK IN THIS CASE OF OBSTACLE AVOIDING ROBOT.*/

  Q_OLD = Q_TABLE[S][A];                        
  Q_MAX = MAX(Q_TABLE, NEXT_S);
  Q_NEW = (1-LEARNING_RATE)*Q_OLD + LEARNING_RATE*(R + DISCOUNT_FACTOR*Q_MAX); 
  Serial.print("Q VALUE : ");
  Serial.println(Q_NEW);
  Q_TABLE[S][A] = Q_NEW;                          
}

///////////////////////////////////////////////////////////END///////////////////////////////////////////////////////////////


/////////////////////////////////////////START OF MAIN LOOP/////////////////////////////////////////////////
void loop() 
{
  /////////////////////////////////////////TRAINING////////////////////////////////////////////
  for(int I =0; I<EPISODES; I++)
  {
    Serial.println("START :");
    ACTION_TAKEN = false;
    FLAG = 0;
       while(true)
          {
            Forward();
            Obstacle = Obstacle_Avoider();
            if(Obstacle == true)
            {
              NEXT_STATE = STATE+1;
              
              if(NEXT_STATE == 10)
                {
                 NEXT_STATE = 0;
                }

               if(NEXT_STATE < 0)
                {
                 NEXT_STATE = 0;
                }
               
              Serial.print("STATE: ");
              Serial.println(STATE);
              FLAG = 1;
              break;
            }
          }
     if(FLAG ==1)
     {
       PROB = RANDOM(EPSILON);
        if (PROB<=EPSILON)     //EXPLORE THE ACTIONS 
          {
            ACTION = random(0,4);
            FLAG = 2;
          }
        else                  //EXPLOIT THE ACTIONS FROM Q TABLE
          {
            ACTION = ARGMAX(Q,STATE);
            FLAG = 2;
          }
     }   

     if(FLAG ==2)
     {
         if(ACTION == 0)
         {
          Forward();
          delay(1500);
          Stop();
          REWARD = REWARDS[STATE][ACTION];
         }

         if(ACTION == 1)
         {
          Backward();
          delay(2500);
          Stop();
          REWARD = REWARDS[STATE][ACTION];
         }

         if(ACTION == 2)
         {
          Stop();
          REWARD = REWARDS[STATE][ACTION];
         }

         if(ACTION == 3)
         {
          Left();
          delay(2000);
          Stop();
          REWARD = REWARDS[STATE][ACTION];
         }

        ACTION_TAKEN = true;  
        delay(500); 
     }

   if(ACTION_TAKEN == true)
     {
      Update(Q,STATE,NEXT_STATE,ACTION ,ACTIONS,REWARD,ALPHA ,GAMMA);
      STATE = NEXT_STATE;
      EPSILON = DECAY(EPSILON);
      if(EPSILON<0.5)
       {
        EPSILON  == 0.9;
       }
      Serial.print("EPISODE ENDED : ");
      Serial.println(I);
      delay(7000);
     }       
  }
  /////////////////////////////////////END OF TRAINING///////////////////////////////////

 //////////////////////////////////////EVALUATION//////////////////////////////////////////
 /*USE THIS TO CHECK WHETHER YOUR Q VALUES ARE RIGHT OR WRONG. IF ALL Q VALUES ARE 
 COMING RIGHT OR SEEMS RIGHT/ACCURATE COMMENT THIS SECTION */
 for(int y = 0; y<=9 ; y++)
   {
    Serial.println("SET OF Q VALUES  WILL START:");
    for(int l = 0; l<=3; l++)
      {
        Serial.print("Q VALUE :");
        Serial.println(Q[y][l]);
        delay(2000);
      }
     delay(2000);
   }
   Serial.println("EVALUATION ENDED");
////////////////////////////////END OF EVALUATION/////////////////////////////////////////

////////////////////////////////////////TESTING////////////////////////////////////////////
while(true)
 {
  Forward();
  Obstacle = Obstacle_Avoider();
  if(Obstacle == true)
   {
     STATE = GET_STATE();
     ACTION = ARGMAX(Q,STATE);
     Serial.print("ACTION TAKEN: ");
     Serial.println(ACTION);

     if(ACTION ==0)
      {
        Forward();
        delay(1500);
        Stop();
      }

     if(ACTION == 1)
       {
        Backward();
        delay(1500);
        Stop();
       }
     if(ACTION == 2)
       {
        Stop();
       }

     if(ACTION == 3)
       {
        Left();
        delay(2000);
        Stop();
       }
     }
  }
  //////////////////////////////////////////////////END OF TESTING////////////////////////////////////////////////////////////
}
//////////////////////////////////////////////////////END OF MAIN LOOP////////////////////////////////////////////////////////

Schematics

obstacle_avoiding_robot_using_ai_DcoMCWIOFm.fzz

Manufacturing process

  1. Artificial Intelligence in Retail: Proven Benefits, Not a Fad
  2. Bluetooth‑Controlled Obstacle‑Avoidance Robot: Build a Smart, Remote‑Operated Vehicle
  3. AI-Powered Maze-Solving Robot Using Arduino and Bluetooth
  4. Build Smarter Apps with Mobile AI: Unlocking Innovation and Growth
  5. AI-Powered Tracking of Amazon Deforestation
  6. AI-Driven Robotics: Cutting-Edge Intelligent Machines
  7. AI: Benefits, Risks, and Industry Impact
  8. Big Data vs AI: Synergy Behind Digital Transformation
  9. AI vs RPA: Why Artificial Intelligence Is Dominating Business Automation
  10. Artificial Intelligence: Evolution, History, and Real-World Applications