CBOW training steps-
Learns the embedding from the context words.
one hot encoding of all the words is done. So we we have total V distinct word. vocabulary size is V. size of every word would be 1*V.
the size of embedded vector/embedding is the number of neurons in hidden layer.
there are 2 weight matrix of size v*N ( input to hidden) and N*V( hidden to output).
objective is to get weight matrix( N*V) as embedding of V vocabulary.
while is training every input word's 1 hot encoded values are passed and output word 's weight is getting trained. The error ( softmax output from hidden to output and actual output value) is passed in back-propagation to optimise weight.

Skip- Gram training steps-
context is learnt from words in vocabulary.
one hot encoding of all the words is done. So we we have total V distinct word. vocabulary size is V. size of every word would be 1*V.
the size of embedded vector/embedding is the number of neurons in hidden layer.
there are 2 weight matrix of size v*N ( input to hidden) and N*V( hidden to output).
objective is to get weight matrix( V*N) as embedding of V vocabulary.
while is training in one iteration input is context word and output is context words. ( 3 output in below image).
error which is sum of error for 3 outputs( context word) is summed and passed in next iteration to optimise weights.
