I ran two different AI models for a local weather forecast project and the results were night and day.

For a small project in Boise, I used a basic text generator and then a model fine-tuned on NOAA data. The fine-tuned one predicted a sudden cold snap three days out that the other one missed completely. What other specific datasets have people found that make this big of a difference for narrow tasks?

3 comments

3 Comments

valallen3mo ago

That's a cool example with the NOAA data. Makes me wonder about other fields where the training data is everything. Like, have you seen cases where someone used a model trained on legal documents versus general web text for contract review? Or maybe medical journals for diagnosis help versus a base model. The gap must be huge for stuff that needs exact wording or specific facts.

daniel_martinez843mo ago

Yeah, exactly what you're saying, @valallen. We tried a base model for pulling clauses from old project reports and it kept making up terms... switched to one trained on engineering specs and the difference was night and day. It finally understood the exact part numbers and standards.

scott.grace2mo ago

Actually, legal models can be a bit of a mixed bag in my experience. Your mileage may vary, but I've seen some that still hallucinate case citations even when trained on court documents, just not as badly as a general model. The training data matters a ton, but the fine-tuning and prompt setup seem to make a big difference too.