This video is part of the HIP workshop playlist. View full playlist here: • Advanced HIP Workshop
Memory that is shared is available for reading and writing from every thread in the block, even if a block consists of more than one thread team. However since threads and thread teams in a block don't always finish work at the same time, there must be blocklevel synchronisation before threads can read what others have written.
Shared memory is often located in high bandwidth, low latency cache on the compute device and might provide a speed boost by virtue of being in close proximity (in terms of latency and bandwidth) to the hardware threads that execute a kernel.