18 Conducting Research

Reading papers is one skill. Doing research is another entirely. This chapter is for those who want to move from consumer of AI knowledge to producer: running experiments, writing papers, contributing to the field.

The good news: AI research has never been more accessible. Many important contributions are made by independent researchers, small teams, and graduate students. You do not need a Google-scale budget to ask interesting questions and find interesting answers. Some of the most impactful recent work (LoRA, QLoRA, many MergeKit innovations) came from small teams or individuals with limited compute.

18.1 Finding a Research Question

The hardest part of research is not running experiments. It is finding the right question to ask. Good research questions share three properties:

Specific: “How can we make AI better?” is not a research question. “Does LoRA fine-tuning on domain-specific data improve factual accuracy on medical question-answering benchmarks?” is.
Answerable: You need to be able to design an experiment that would (in principle) answer the question. “Does GPT-4 truly understand language?” is philosophically interesting but experimentally unanswerable. “Does GPT-4 performance on the ARC benchmark improve with chain-of-thought prompting?” is testable.
Interesting: The answer should matter to someone. Even a perfectly executed study on an unimportant question will not get read.

Where Research Questions Come From

Most good research questions come from one of four sources: (1) Limitations in existing work: Every paper has a “limitations” section. Those limitations are your research opportunities. (2) Unexpected observations: You notice something weird while running experiments. Chase it. (3) Cross-pollination: An idea that works in one area (e.g., pruning in computer vision) has not been tried in another (e.g., pruning for LLMs). Try it. (4) Practical needs: You need to solve a real problem and existing methods do not work. Build a solution and evaluate it.

18.2 Designing Experiments

Good experimental design is the difference between a convincing paper and a rejected one.

Baselines: Always compare against strong, well-established baselines. A new method that only beats a strawman baseline is unconvincing. Use the same evaluation setup as prior work so your results are directly comparable.

Controlled variables: Change one thing at a time. If you change the model, the dataset, and the hyperparameters simultaneously, you cannot attribute any improvement to any specific change.

Statistical significance: Run experiments multiple times with different random seeds and report mean and standard deviation. A single run can be lucky or unlucky. Three to five seeds is standard practice.

Ablation studies: For every component in your method, run an experiment with that component removed. This tells you what actually matters and what is decorative.

Compute budget: Plan your experiments around your actual compute budget. It is better to thoroughly evaluate a method on small models ($\leq$ 7B) across many conditions than to run a single experiment on a 70B model and hope.

Reproducibility Crisis in ML

Machine learning has a reproducibility problem. A 2020 survey found that fewer than 50% of NeurIPS papers could be reproduced, even with the original code. Common causes: missing hyperparameters, undocumented preprocessing steps, hardware-dependent results, and cherry-picked seeds. Be part of the solution: release your code, document every detail, use configuration files instead of command-line arguments, and seed your random number generators.

The Reproducibility Standard

The gold standard for an ML experiment: someone with no connection to your lab, using only your paper and code, should be able to reproduce your main result within a reasonable margin. If they cannot, your experiment is not yet science. Make this your personal standard. Your papers will be more credible, your reviews will be more favorable, and your work will have more lasting impact.

18.3 Running Experiments at Scale (on a Budget)

You do not need a data center to do meaningful ML research, but you do need some compute:

Google Colab: Free tier includes T4 GPUs (16 GB VRAM). Sufficient for fine-tuning small models and running inference. The Pro tier ($10/month) adds A100 access.
Lambda Labs, Vast.ai, RunPod: On-demand GPU rental. A single A100 costs roughly $1 to 2 per hour. A two-week intensive research sprint might cost $200 to 500.
University clusters: If you are a student, your university likely has GPU resources you are not using. Ask your advisor or IT department.
Kaggle Kernels: Free T4 and P100 GPUs with 30 hours per week. Often overlooked as a research resource.
Experiment tracking: Use Weights & Biases (free for academics) or MLflow to log metrics, hyperparameters, and artifacts. Your future self will thank you when you need to compare runs from three months ago.

18.4 Writing a Research Paper

Once you have results, you need to write them up. The standard ML paper follows the structure described in Chapter 13a (Introduction, Related Work, Method, Experiments, Conclusion), but good writing requires more than following a template.

Tell a story: The best papers have a narrative arc. There is a problem (tension), a key insight (turning point), a method (development), and results (resolution). The reader should feel like they are on a journey, not reading a technical manual.

Lead with the insight, not the method: Readers want to know why your method works before learning how. Start with the intuition, then present the details.

Figures are everything: A great figure can make a paper. The architecture diagram in “Attention Is All You Need” is arguably the most reproduced figure in ML history. Invest time in clear, informative figures.

Be honest about limitations: Reviewers will find the weaknesses in your work. It is far better to acknowledge them yourself and discuss why they do not undermine your contribution than to pretend they do not exist.

The LaTeX Workflow

Almost all ML papers are written in LaTeX. Use the venue's official template (NeurIPS, ICML, ICLR, ACL, etc.). Overleaf makes collaboration easy. BibTeX manages references. TikZ or draw.io can create figures. For tables, use the booktabs package for professional-looking results. Get comfortable with LaTeX early; you will use it for every paper you write.

18.5 Finding Mentors and Collaborators

Research is rarely a solo endeavor. Collaborators bring complementary skills, different perspectives, and accountability. Here is how to find them:

Academic advisors: If you are at a university, your advisor is your most important collaborator. Choose someone whose research interests overlap with yours, who publishes regularly, and (equally important) who is responsive and supportive.

Online communities: ML Collective (mlcollective.org) provides research mentorship for independent researchers without academic affiliations. EleutherAI's Discord hosts active research collaborations. Twitter/X is surprisingly effective for finding collaborators: share your work, engage with others' work, and connections form naturally.

Reading groups: Join or start a weekly paper reading group. After a few months, the best discussions will naturally evolve into research collaborations.

Hackathons and sprints: Events like NeurIPS workshops, Kaggle competitions, and HuggingFace sprints are excellent for meeting potential collaborators and demonstrating your skills.

The Cold Email That Works

Want to collaborate with a specific researcher? Send a short, specific email: “I read your paper X. I think combining your method with Y could improve results on Z. I have run preliminary experiments showing [concrete result]. Would you be interested in chatting?” This works far better than vague messages about “being interested in your research.” Lead with what you can contribute, not what you want.

18.6 Research Ethics

AI research carries ethical responsibilities that go beyond academic integrity:

Dual use: Your research may be used in ways you did not intend. A method for generating realistic text can be used for education or for propaganda. A face recognition system can be used for security or for surveillance. Think about potential misuse before publishing.

Bias and fairness: Datasets encode societal biases. Models trained on biased data perpetuate and amplify those biases. Always evaluate your models for demographic disparities and document any biases you find.

Environmental impact: Training large models consumes significant energy. Report your training compute and carbon footprint. Consider whether your experiments are necessary at the scale you are running them.

Data privacy: If your research uses data from human subjects (even publicly available social media data), check your institution's IRB (Institutional Review Board) requirements. Privacy expectations vary by culture and context.

The Responsible Disclosure Dilemma

What if your research reveals a vulnerability (e.g., a jailbreak that bypasses all safety filters)? Publishing it helps the community develop defenses, but also teaches bad actors how to exploit it. The security community's practice of responsible disclosure---privately informing the affected party and giving them time to fix the issue before publishing---is increasingly adopted in AI safety research. When in doubt, disclose privately first.

18.7 Submitting and Peer Review

Major ML venues operating on a peer review system include:

Conferences: NeurIPS, ICML, ICLR, AAAI, ACL, EMNLP, CVPR, ECCV. These are the most prestigious venues, with acceptance rates of 20 to 30%.
Workshops: Attached to major conferences. Lower barrier to entry, great for early-career researchers to get feedback and build a network.
Journals: JMLR, TMLR (Transactions on Machine Learning Research). TMLR has an open, rolling review process that is more transparent than conference reviews.
arXiv: Not peer-reviewed, but the primary venue where ML papers are first published. Many important papers are never formally published at a conference and exist only on arXiv.

Your First Submission

If you are preparing your first paper submission, here is what to expect: the formatting requirements will take longer than you think, the page limit will feel impossible, and the review process will take 2 to 4 months. You will receive 3 to 4 reviews, which will range from insightful to frustrating. Your paper will probably be rejected. This is normal. Even top researchers have rejection rates of 30 to 50%. Treat every review as feedback, improve the paper, and resubmit.

Handling rejection: Most papers are rejected on the first submission. This is normal. Read the reviews carefully, address the concerns, and resubmit to the same or different venue. The best researchers have drawers full of rejected papers that eventually became influential publications.

18.8 Contributing Without Publishing Papers

Research contributions extend beyond papers:

Open-source code: Release high-quality implementations of papers. This is genuinely valuable and builds your reputation.
Reproducibility studies: Attempt to reproduce published results and document your findings (whether you succeed or fail).
Blog posts and tutorials: Clearly explaining complex ideas is a rare and valued skill. Many researchers are known more for their blog posts than their papers.
Benchmark contributions: Create new evaluation datasets, improve existing benchmarks, or identify flaws in popular benchmarks.
Community participation: Answer questions on forums, review papers for workshops, organize reading groups. The ML community is remarkably open and collaborative.

18.9 Exercises

Identify three limitations from a recent paper in your area of interest. For each, sketch a potential follow-up study that addresses the limitation. Which is the most feasible given your current resources?
Take an existing method (e.g., LoRA fine-tuning) and apply it to a new domain or dataset that has not been studied. Design a complete experimental protocol: baselines, metrics, ablations, and number of seeds.
Practice writing a one-page “extended abstract” describing a research idea. Include motivation, proposed method, expected results, and potential limitations. Share it with a peer for feedback.
Reproduce the main result of a published paper using the authors' code. Document every step, including any issues you encountered. If you cannot reproduce the result, document why.
Submit a paper or extended abstract to a workshop at a major conference. Workshops are the best way to get feedback, especially for early-career researchers.