Languages reuse units to clarify meaning. Spoken languages use consonant and vowel sounds to do the trick. There is something called the International Phonetic Alphabet, which showcases every known sound from every known language on Earth. No natural language is expected to contain all of them, or even most of them, so it's safe to choose a handful.
As a result of the IPA coming to exist, linguists created sound inventories to demonstrate what they sound like. A language's sound inventory consists of the sounds that are phonemic, all of which are written using the IPA notation, though there are sounds in a language that isn't phonemic. Allophones, loanwords, those are factors of such sounds bearing such status.
The IPA came to exist for a reason:
After deciding the goals for your conlang, you'll need to decide what consonant and vowel sounds might be included. Look through the IPA, or the sound inventories of languages you speak, are learning, or just like. There are audio files that could help with the pronunciations. Keep in mind that like David J. Peterson once said, romanization and orthography are not synonymous. This could apply to Enligh orthography and IPA transcriptions as well. The most obvious example is the IPA glyph for [j], which is used in English to represent the voiced postalveolar affricate heard at the beginning of words like "jellybean".
There are also sounds that aren't in English. There are two sounds represented with "the" in English, known as the dental fricatives. They're rare cross-linguistically, as most languages lack them, except for Arabic, which has the pharyngeal fricatives. If you're including more than one language for a fictional world, look up the frequencies of the sounds and keep them in mind. You could bend the rules in some way, but try not to go overboard.
If you're struggling with deciding the sounds, maybe look for words from languages, those words bearing odd sounds you might gain a liking to, and document them. Also, keep sound symmetry in mind. Sound symmetry is the occurrence of one more sound with the same place and manner of articulation. The places of articulation are where in the mouth the sound is pronounced, and the manners of articulation are the ways the sound is pronounced.
Plosives completely obstruct airflow. Fricatives allow air to pass through like friction. Affricates begin as stops and end as fricatives. Nasals let air pass through the nose. Trills involve the vibration of loose-enough parts of the mouth, albeit the lips or the tongue, or the uvula. Taps and/or flaps are like trills, but only one vibration. I need a description for approximants. Laterals involve the sound coming out the sides of the mouth. "Liquids" is an umbrella term for trills, taps/flaps, approximants, etc, and alter the unrestricted airway's contour in a slight way. The umbrella term for those and nasals is sonorants/resonants, while the rest are called obstruents. Another umbrella term exists for fricatives, liquids, and vowels, the term being "continuant". So does one for plosives/stops, affricates(though "stops" is an umbrella term for plosives and affricates to some people), and nasals, this term being "occlusives".