ICH

ICH E1

ICH E1 The Extent of Population Exposure: to Assess Clinical Safety for Drug Intended for Long-Term Treatment of Non-Life-Threatening Conditions
ICH E1 人群暴露程度：评价无生命威胁条件下长期治疗药物的临床安全性

ICH E2
ICH E3
ICH E4
ICH E5

ICH E5(R1) Ethnic Factors in the Acceptability of Foreign Clinical Data
ICH E5(R1) 接受国外临床试验数据的种族因素
ICH E5(R1) Implementation Working Group Questions & Answers
ICH E5(R1) 接受国外临床试验数据的种族因素问答

ICH E6
ICH E7
ICH E8

ICH E8(R1) General Considerations for Clinical Studies
ICH E8(R1) 临床试验的一般考虑

ICH E9

ICH E9 Statistical Principles for Clinical Trials
ICH E9 临床试验的统计学原则
ICH E9(R1) Addendum: Statistical Principles for Clinical Trials
ICH E9(R1) 临床试验中的估计目标与敏感性分析（E9指导原则增补文件）

ICH E10
ICH E11
ICH E12
ICH E14
ICH E15
ICH E16
ICH E17
ICH E18
ICH E19
ICH E20

ICH E20 Adatpative Designs for Clinical Trials

ICH M1
ICH M2
ICH M4
ICH M8

ICH E1

ICH E1 The Extent of Population Exposure: to Assess Clinical Safety for Drug Intended for Long-Term Treatment of Non-Life-Threatening Conditions

中文版

ICH E1 人群暴露程度：评价无生命威胁条件下长期治疗药物的临床安全性

The objective of this guideline is to present an accepted set of principles for the safety evaluation of drugs intended for the long-term treatment (chronic or repeated intermittent use for longer than 6 months) of non-life-threatening diseases. The safety evaluation during clinical drug development is expected to characterise and quantify the safety profile of a drug over a reasonable duration of time consistent with the intended long-term use of the drug. Thus, duration of drug exposure and its relationship to both time and magnitude of occurrence of adverse events are important considerations in determining the size of the data base necessary to achieve such goals.

For the purpose of this guideline, it is useful to distinguish between clinical data on adverse drug events (ADEs) derived from studies of shorter duration of exposure and data from studies of longer duration, which frequently are non-concurrently controlled studies. It is expected that short-term event rates (cumulative 3-month incidence of about 1%) will be well characterised. Events where the rate of occurrence changes over a longer period of time may need to be characterised depending on their severity and importance to the risk-benefit assessment of the drug. The safety evaluation during clinical drug development is not expected to characterise rare adverse events, for example, those occurring in less than 1 in 1000 patients.

The design of the clinical studies can significantly influence the ability to make causality judgements about the relationships between the drug and adverse events. A placebo-controlled trial allows the adverse event rate in the drug-treated group to be compared directly with the background event rate in the patient population being studied. Although a study with a positive or active control will allow a comparison of adverse event rates to be made between the test drug and the control drug, no direct assessment of the background event rate in the population studied can be made. A study that has no concurrent control group makes it more difficult to assess the causality relationship between adverse events observed and the test drug.

There was general agreement on the following:

A harmonised regulatory standard is of value for the extent and duration of treatment needed to provide the safety data base for drugs intended for long-term treatment of non-life-threatening conditions. Although this standard covers many indications and drug classes, there are exceptions.
Regulatory standards for the safety evaluation of drugs should be based on previous experience with the occurrence and detection of adverse drug events (ADEs), statistical considerations of the probability of detecting specified frequencies of ADEs, and practical considerations.
Information about the occurrence of ADEs in relation to duration of treatment for different drug classes is incomplete, and further investigations to obtain this information would be useful.
Available information suggests that most ADEs first occur, and are most frequent, within the first few months of drug treatment. The number of patients treated for 6 months at dosage levels intended for clinical use, should be adequate to characterise the pattern of ADEs over time.

To achieve this objective the cohort of exposed subjects should be large enough to observe whether more frequently occurring events increase or decrease over time as well as to observe delayed events of reasonable frequency (e.g., in the general range of 0.5%-5%). Usually 300-600 patients should be adequate.
There is concern that, although they are likely to be uncommon, some ADEs may increase in frequency or severity with time or that some serious ADEs may occur only after drug treatment for more than 6 months. Therefore, some patients should be treated with the drug for 12 months. In the absence of more information about the relationship of ADEs to treatment duration, selection of a specific number of patients to be followed for 1 year is to a large extent a judgement based on the probability of detecting a given ADE frequency level and practical considerations.

100 patients exposed for a minimum of one-year is considered to be acceptable to include as part of the safety data base. The data should come from prospective studies appropriately designed to provide at least one year exposure at dosage levels intended for clinical use. When no serious ADE is observed in a one-year exposure period this number of patients can provide reasonable assurance that the true cumulative one year incidence is no greater than 3%.
It is anticipated that the total number of individuals treated with the investigational drug, including short-term exposure, will be about 1500. Japan currently accepts 500-1500 patients: the potential for a smaller number of patients is due to the post-marketing surveillance requirement, the actual number for a specific drug being determined by the information available on the drug and drug class.
There are a number of circumstances where the harmonised general standards for the clinical safety evaluation may not be applicable. Reasons for, and examples of, these exceptions are listed below. It is expected that additional examples may arise. It should also be recognised that the clinical data base required for efficacy testing may be occasionally larger or may require longer patient observation than that required by this guideline.

Exceptions:
1. Instances where there is concern that the drug will cause late developing ADEs, or cause ADEs that increase in severity or frequency over time, would require a larger and/or longer-term safety data base. The concern could arise from:
2. Situations in which there is a need to quantitate the occurrence rate of an expected specific low-frequency ADE will require a greater long-term data base. Examples would include situations where a specific serious ADE has been identified in similar drugs or where a serious event that could represent an alert event is observed in early clinical trials.
3. Larger safety data bases may be needed to make risk/benefit decisions in situations where the benefit from the drug is either (1) small (e.g., symptomatic improvement in less serious medical conditions) or (2) will be experienced by only a fraction of the treated patients (e.g., certain preventive therapies administered to healthy populations) or (3) is of uncertain magnitude (e.g., efficacy determination on a surrogate endpoint).
4. In situations where there is concern that a drug may add to an already significant background rate of morbidity or mortality, clinical trials may need to be designed with a sufficient number of patients to provide adequate statistical power to detect prespecified increases over the baseline morbidity or mortality.
5. In some cases, a smaller number of patients may be acceptable, for example, where the intended treatment population is small.
Filing for approval will usually be possible based on the data from patients treated through 6 months. Data on patients treated through 12 months must be submitted as soon as available and prior to approval in the United States and Japan but may be submitted after approval in the E.C.. In the U.S. the initial submission for those drugs designated as priority drugs must include the 12-months patient data.

ICH E1

ICH E1 人群暴露程度：评价无生命威胁条件下长期治疗药物的临床安全性

English Version

ICH E1 The Extent of Population Exposure: to Assess Clinical Safety for Drug Intended for Long-Term Treatment of Non-Life-Threatening Conditions

本指导原则的目的在于提出一套用于非危及生命性疾病长期治疗（超过6个月的慢性或间断使用）药物的安全性评估原则。在药物临床研发期间的安全性评价要求能定性和定量地描述与药物预期长期使用时间相一致的一段合理时间内药物的安全性特征。因此，在确定达到上述目标所必需的数据库规模时，药物暴露的持续时间及其与不良事件发生的时间和严重程度的关系是考虑要点。

为达到这一指导原则的目的，区分与药物不良事件相关的临床数据是来源于较短期的暴露研究还是较长期的暴露研究是相当有用的；长期暴露研究通常是非同期对照研究。要求很好地描述短期不良事件发生率（约1%的3个月累积发生率）。如果发生率在一段较长的时间内是变化的，这些不良事件应该根据它们的严重程度及其对药物风险-获益评估的重要性来描述。药物临床开发期间的安全性评价并不要求描述罕见不良事件的特征，例如发生率小于千分之一的事件。

临床研究设计能显著地影响对药物与不良事件之间的因果关系的判断。安慰剂对照试验允许将药物治疗组的不良事件发生率与被研究患者人群中不良事件的背景发生率直接比较。尽管一个使用阳性或活性对照的研究允许比较受试药物与对照药物间的不良事件发生率，但不能直接评价被研究人群中的不良事件的背景发生率。不设同期对照组的研究会使受试药物与观察到的不良事件之间的因果关系的评估变得更加困难。

以下是总体协定：

一个协调的监管标准对于为预期用于非危及生命性疾病长期治疗的药物提供安全性评价数据库所必需的治疗程度和持续时间是相当有价值的。尽管这一标准涵盖许多适应症和许多药物类别，但仍存在例外情况。
药物临床安全性评价的监管标准应基于以往对药物不良事件（ADE）的发生及观察经验、测定特定频率ADE概率的统计学考虑以及实际的考虑。
不同种药物的与治疗持续时间相关的ADE的发生信息仍不完整，为获得这些信息而进行的深入研究是有益的。
已获知的信息显示极大部分ADE在药物治疗的最初几个月首次出现，而且最为频繁；以临床预期使用的剂量水平治疗一定数量的患者共6个月，患者的数量应足以描述这段时期内ADE的特征。

为达到这一目的，接受药物暴露的受试者组应足够大以观测较频繁发生的事件随时间是否增加或减少，同时观察合理频率(例如，总体范围为0.5%~5%)下延迟发生的事件。足够的患者数量通常为300~600例。
有一点值得注意，即尽管通常不常见，一些ADE随时间的延长其频率和强度也增加；一些严重ADE可能仅在药物治疗6个月后发生。因此，一些患者的治疗时间应持续至12个月。在没有更多的关于ADE与治疗持续时间关系的相关信息时，选择一定数量的患者，使他们的随访持续至1年，在很大程度上是根据测定特定的ADE频率水平的概率和实际考虑而作出的判断。

100 名患者接受药物暴露至少1年作为安全性评价数据库的一部分是可以接受的；数据应来源于前瞻性研究，这些研究应经过适当设计提供在临床预期使用剂量水平至少1年的药物暴露。在1年的药物暴露期间如果没有观测到严重ADE，那么这一患者数量可以提供合理的保证，保证1年累积真实发生率不大于3%。
经研究药物治疗的个体总数（包括短期暴露）的期望值约为1500例。目前日本接受500~1500例患者：较少的患者数量存在的可能性是由于上市后药物的监测要求；对特定药物而言，实际病例数是由与药物及药物类别有关的已获知的信息决定的。
有些情况临床安全性评价的协调的总体标准并不适用。这些例外情况的原因和实例见下文。预计其他的实例也可能出现。应被认识的一点是有效性验证所要求的临床数据库，偶尔可能比这一指导原则中要求的更大或要求对患者观察更长时间。

例外情况：
1. 当有顾虑药物将导致迟发ADE或随时间的延长ADE的强度或频率增加时，需要更大和/或更长期的安全性数据库。这些顾虑可能来源于：
2. 当需要对一个预期低频率的特定ADE的发生率作定量描述时，需要一个较长期的数据库；这种情况包括在相似药物中发生的一种特定的且已确定的严重ADE；或者在早期临床试验中观察到可能代表警告事件的一个严重事件。
3. 当药物获益（1）太小（例如，在非严重疾病条件下的症状改善）；或者（2）仅有一部分接受治疗的患者能够体验（例如，对健康人群的某些预防性治疗）；或者（3）大小仍未确定（例如，以替代终点进行疗效确定）时，在决定风险/获益关系时需要较大的数据库。
4. 当药物可能增加疾病本身就存在的显著的背景死亡率或发病率时，临床试验可能需要足够的患者数量以提供适宜的统计学效力，从而能检测到发病率或死亡率比基线条件下有预期设定的增加。
5. 在一些条件下较少的患者数量是可以接受的，例如，预期的治疗人群数量较小时。
根据治疗6个月的患者数据提出新药许可申请通常是合理的；在美国和日本，经12个月治疗的患者数据一经获得必须立刻提交，并且必须在获得新药许可之前提交，但在欧洲国家，12个月数据的提交可以在获得新药许可之后。在美国，那些被认定为优先药物（priority drugs)的药物，其首次递交的申请中必须包括经12个月治疗的患者数据。

ICH E2

ICH E3

ICH E4

ICH E5

ICH E5(R1) Ethnic Factors in the Acceptability of Foreign Clinical Data

中文版

ICH E5(R1) 接受国外临床试验数据的种族因素

1. INTRODUCTION

The purpose of this guidance is to facilitate the registration of medicines among ICH regions* (see Glossary) by recommending a framework for evaluating the impact of ethnic factors* upon a medicine’s effect, i.e., its efficacy and safety at a particular dosage* and dose regimen*. It provides guidance with respect to regulatory and development strategies that will permit adequate evaluation of the influence of ethnic factors while minimizing duplication of clinical studies and supplying medicines expeditiously to patients for their benefit. This guidance should be implemented in context with the ICH guidances. For the purposes of this document, ethnic factors are defined as those factors relating to the genetic and physiologic (intrinsic*) and the cultural and environmental (extrinsic*) characteristics of a population (Appendix A).

1.1 Objectives

To describe the characteristics of foreign clinical data that will facilitate their extrapolation to different populations and support their acceptance as a basis for registration of a medicine in a new region*.
To describe regulatory strategies that minimize duplication of clinical data and facilitate acceptance of foreign clinical data in the new region.
To describe the use of bridging studies*, when necessary, to allow extrapolation of foreign clinical data to a new region.
To describe development strategies capable of characterizing ethnic factor influences on safety, efficacy, dosage and dose regimen.

1.2 Background

All regions acknowledge the desirability of utilizing foreign clinical data that meet the regulatory standards and clinical trial practices acceptable to the region considering the application for registration.

However, concern that ethnic differences may affect the medication’s safety, efficacy, dosage and dose regimen in the new region has limited the willingness to rely on foreign clinical data. Historically, this has been one of the reasons, therefore, the regulatory authority in the new region has often requested that all, or much of, the foreign clinical data in support of registration be duplicated in the new region. Although ethnic differences among populations may cause differences in a medicine’s safety, efficacy, dosage or dose regimen, many medicines have comparable characteristics and effects across regions. Requirements for extensive duplication of clinical evaluation for every compound can delay the availability of new therapies and unnecessarily waste drug development resources.

1.3 Scope

This guidance is based on the premise that it is not necessary to repeat the entire clinical drug development program in the new region and is intended to recommend strategies for accepting foreign clinical data as full or partial support for approval of an application in a new region. It is critical to appreciate that this guidance is not intended to alter the data requirements for registration in the new region; it seeks to recommend when these data requirements may be satisfied with foreign clinical data. All data in the clinical data package, including foreign data, should meet the standards of the new region with respect to study design and conduct and the available data should satisfy the regulatory requirements in the new region. Additional studies conducted in any region may be required by the new region to complete the clinical data package.

Once a clinical data package fulfils the regulatory requirements of the new region, the only remaining issue with respect to the acceptance of the foreign clinical data is its ability to be extrapolated to the population of the new region. When the regulatory authority or the sponsor is concerned that differences in ethnic factors could alter the efficacy or safety of the medicine in the population in the new region, the sponsor may need to generate a limited amount of clinical data in the new region in order to extrapolate or “bridge” the clinical data between the two regions.

its completeness with respect to the regulatory requirements of the new region; and

the ability to extrapolate to the new region those parts of the application (which could be most or all of the application) based on studies from the foreign region (Appendix B).

2. ASSESSMENT OF THE CLINICAL DATA PACKAGE INCLUDING FOREIGN CLINICAL DATA FOR ITS FULFILMENT OF REGULATORY REQUIREMENTS IN THE NEW REGION

The regional regulatory authority would assess the clinical data package, including the foreign data, as to whether or not it meets all of the regulatory standards regarding the nature and quality of the data, irrespective of its geographic origin, i.e., data generated either totally in a foreign region (or regions) or data from studies conducted both in a foreign and the new region to which the application is being made. A clinical data package that meets all of these regional regulatory requirements is defined as a “Complete” Clinical Data Package* for submission and potential approval. The acceptability of the foreign clinical data component of the complete data package depends then upon whether it can be extrapolated to the population of the new region.

Before extrapolation can be considered, the Complete Clinical Data Package, including foreign clinical data, submitted to the new region should contain:

Adequate characterization of pharmacokinetics*, pharmacodynamics*, dose-response, efficacy and safety in the population of the foreign region(s).
Clinical trials establishing dose response, efficacy and safety. These trials should:

Be designed and conducted according to regulatory standards in the new region, e.g., choice of controls, and should be conducted according to GCP
Be adequate and well-controlled*
Utilize endpoints that are considered appropriate for assessment of treatment
Evaluate clinical disorders using medical and diagnostic definitions that are acceptable to the new region.

Characterization in a population relevant to the new region of the pharmacokinetics, and where possible, pharmacodynamics and dose response for pharmacodynamic endpoints. This characterization could be performed in the foreign region in a population representative of the new region* or in the new region*.

Several ICH guidelines that address aspects of design, conduct, analysis and reporting of clinical trials will help implement the concepts of the Complete Clinical Data Package. These guidances include GCP’s (E6), evaluation of dose response (E4), adequacy of safety data (E1 and E2), conduct of studies in the elderly (E7), reporting of study results (E3), general considerations for clinical trials (E8), and statistical considerations (E9). A guidance on the choice of control group in clinical trials (E10) is under development.

2.1 Additional Studies to Meet the New Region’s Regulatory Requirements

When the foreign clinical data do not meet the regional regulatory requirements, the regulatory authority may require additional clinical trials such as:

clinical trials in different subsets of the population such as patients with renal insufficiency, patients with hepatic dysfunction, etc.
clinical trials using different comparators at the new region’s approved dosage and dose regimen
drug-drug interaction studies

3. ASSESSMENT OF THE FOREIGN CLINICAL DATA FOR EXTRAPOLATION TO THE NEW REGION

3.1 Characterization of the Medicine’s Sensitivity to Ethnic Factors

knowledge of its pharmacokinetic and pharmacodynamic properties and the translation of those properties to clinical effectiveness and safety. A reasonable evaluation is described in Appendix C. Some properties of a medicine (chemical class, metabolic pathway, pharmacologic class) make it more or less likely to be affected by ethnic factors (Appendix D). Characterization of a medicine as “ethnically insensitive”, i.e., unlikely to behave differently in different populations, would usually make it easier to extrapolate data from one region to another and need less bridging data.

Factors that make a medicine ethnically sensitive or insensitive will become better understood and documented as effects in different regions are compared. It is clear at present, however, that such characteristics as clearance by an enzyme showing genetic polymorphism and a steep dose-response curve will make ethnic differences more likely. Conversely, a lack of metabolism or active excretion, a wide therapeutic dose range*, and a flat dose response curve will make ethnic differences less likely. The clinical experience with other members of the drug class in the new region will also contribute to the assessment of the medicine’s sensitivity to ethnic factors. It may be easier to conclude that the pharmacodynamic and clinical behaviour of a medicine will be similar in the foreign and new regions if other members of the pharmacologic class have been studied and approved in the new region with dosing regimens similar to those used in the original region.

3.2 Bridging Data Package

3.2.1 Definition of Bridging Data Package and Bridging Study

A bridging data package consists of: 1) selected information from the Complete Clinical Data Package that is relevant to the population of the new region, including pharmacokinetic data, and any preliminary pharmacodynamic and dose-response data, and 2) if needed, a bridging study to extrapolate the foreign efficacy data and/or safety data to the new region.

A bridging study is defined as a study performed in the new region to provide pharmacodynamic or clinical data on efficacy, safety, dosage and dose regimen in the new region that will allow extrapolation of the foreign clinical data to the population in the new region. A bridging study for efficacy could provide additional pharmacokinetic information in the population of the new region. When no bridging study is needed to provide clinical data for efficacy, a pharmacokinetic study in the new region may be considered as a bridging study.

3.2.2 Nature and Extent of the Bridging Study

This guidance proposes that when the regulatory authority of the new region is presented with a clinical data package that fulfils its regulatory requirements, the authority should request only those additional data necessary to assess the ability to extrapolate foreign data from the Complete Clinical Data Package to the new region. The sensitivity of the medicine to ethnic factors will help determine the amount of such data. In most cases, a single trial that successfully provides these data in the new region and confirms the ability to extrapolate data from the original region should suffice and should not need further replication. Note that even though a single study should be sufficient to “bridge” efficacy data, a sponsor may find it practical to obtain the necessary data by conducting more than one study. For example, where it is intended that a fixed dose, dose-response study using a clinical endpoint is needed as the bridging study, a short-term pharmacologic endpoint study may be used to choose the dose(s) for the larger (clinical endpoint) study.

When the regulatory authority requests, or the sponsor decides to conduct, a bridging study, discussion between the regional regulatory authority and sponsor is encouraged, when possible, to determine what kind of bridging study will be needed. The relative ethnic sensitivity will help determine the need for and the nature of the bridging study. For regions with little experience with registration based on foreign clinical data, the regulatory authorities may still request a bridging study for approval even for compounds insensitive to ethnic factors. As experience with interregional acceptance increases, there will be a better understanding of situations in which bridging studies are needed. It is hoped that with experience, the need for bridging data will lessen.

The following is general guidance about the ability to extrapolate data generated from a bridging study:

If the bridging study shows that dose response, safety and efficacy in the new region are similar, then the study is readily interpreted as capable of “bridging” the foreign data.
If a bridging study, properly executed, indicates that a different dose in the new region results in a safety and efficacy profile that is not substantially different from that derived in the original region, it will often be possible to extrapolate the foreign data to the new region, with appropriate dose adjustment, if this can be adequately justified (e.g., by pharmacokinetic and/or pharmacodynamic data).
If the bridging study designed to extrapolate the foreign data is not of sufficient size to confirm adequately the extrapolation of the adverse event profile to the new population, additional safety data may be necessary (section 3.2.4).
If the bridging study fails to verify safety and efficacy, additional clinical data (e.g., confirmatory clinical trials) would be necessary.

3.2.3 Bridging Studies for Efficacy

Generally, for medicines characterized as insensitive to ethnic factors, the type of bridging study needed (if needed) will depend upon experience with the drug class and upon the likelihood that extrinsic ethnic factors (including design and conduct of clinical trials) could affect the medicine’s safety, efficacy, and dose-response. For medicines that are ethnically sensitive, a bridging study may often be needed if the populations in the two regions are different. The following examples illustrate types of bridging studies for consideration in different situations:

No Bridging Study

In some situations, extrapolation of clinical data may be feasible without a bridging study:

If the medicine is ethnically insensitive and extrinsic factors such as medical practice and conduct of clinical trials in the two regions are generally similar.

If the medicine is ethnically sensitive but the two regions are ethnically similar and there is sufficient clinical experience with pharmacologically related compounds to provide reassurance that the class behaves similarly in patients in the two regions with respect to efficacy, safety, dosage and dose regimen. This might be the case for well-established classes of drugs known to be administered similarly but not necessarily identically in the two regions.

Bridging Studies using pharmacologic endpoints

If the regions are ethnically dissimilar and the medicine is ethnically sensitive but extrinsic factors are generally similar (e.g., medical practice, design and conduct of clinical trials) and the drug class is a familiar one in the new region, a controlled pharmacodynamic study in the new region, using a pharmacologic endpoint that is thought to reflect relevant drug activity (which could be a well-established surrogate endpoint) could provide assurance that the efficacy, safety, dose and dose regimen data developed in the first region are applicable to the new region. Simultaneous pharmacokinetic (i.e., blood concentration) measurements may make such studies more interpretable.

Controlled Clinical Trials

It will usually be necessary to carry out a controlled clinical trial, often a randomized, fixed dose, dose-response study, in the new region when:

1. there are doubts about the choice of dose,

2. there is little or no experience with acceptance of controlled clinical trials carried out in the foreign region,

3. medical practice, e.g., use of concomitant medications and design and/or conduct of clinical trials are different, or

4. the drug class is not a familiar one in the new region.

Depending on the situation, the trial could replicate the foreign study or could utilize a standard clinical endpoint in a study of shorter duration than the foreign studies or utilize a validated surrogate endpoint, e.g., blood pressure or cholesterol (longer studies and other endpoints may have been used in the foreign phase III clinical trials).

If pharmacodynamic data suggest that there are interregional differences in response, it will generally be necessary to carry out a controlled trial with clinical endpoints in the new region. Pharmacokinetic differences may not always create that necessity, as dosage adjustments in some cases might be made without new trials. However, any substantial difference in metabolic pattern may often indicate a need for a controlled clinical trial.

When the practice of medicine differs significantly in the use of concomitant medications, or adjunct therapy could alter the medicine’s efficacy or safety, the bridging study should be a controlled clinical trial.

3.2.4 Bridging Studies for Safety

Even though the foreign clinical data demonstrate efficacy and safety in the foreign region, there may occasionally remain a safety concern in the new region. Safety concerns could include the accurate determination of the rates of relatively common adverse events in the new region and the detection of serious adverse events (in the 1% range and generally needing about 300 patients to assess). Depending upon the nature of the safety concern, safety data could be obtained in the following situations:

A bridging study to assess efficacy, such as a dose-response study, could be powered to address the rates of common adverse events and could also allow identification of serious adverse events that occur more commonly in the new region. Close monitoring of such a trial would allow recognition of such serious events before an unnecessarily large number of patients in the new region is exposed. Alternatively, a small safety study could precede the bridging study to provide assurance that serious adverse effects were not occurring at a high rate.

If there is no efficacy bridging study needed or if the efficacy bridging study is too small or of insufficient duration to provide adequate safety information, a separate safety study may be needed. This could occur where there is:
an index case of a serious adverse event in the foreign clinical data
a concern about differences in reporting adverse events in the foreign region
only limited safety data in the new region arising from an efficacy bridging study, inadequate to extrapolate important aspects of the safety profile, such as rates of common adverse events or of more serious adverse events

4. DEVELOPMENTAL STRATEGIES FOR GLOBAL DEVELOPMENT

Definition of not only pharmacokinetics but also pharmacodynamics and dose response early in the development program may facilitate the determination of the need for, and nature of, any requisite bridging data. Any candidate medicine for global development should be characterized as ethnically sensitive or insensitive (Appendix D). Ideally, this characterization should be conducted during the early clinical phases of drug development, i.e., human pharmacology and therapeutic exploratory studies. In some cases, it may be useful to discuss bridging study designs with regulatory agencies prior to completion of the clinical data package. However, analysis of the data within the Complete Clinical Data Package will determine the need for, and type of bridging study. For global development, studies should include populations representative of the regions where the medicine is to be registered and should be conducted according to ICH guidelines.

5. SUMMARY

This guidance describes how a sponsor developing a medicine for a new region can deal with the possibility that ethnic factors could influence the effects (safety and efficacy) of medicines and the risk/benefit assessment in different populations. Results from the foreign clinical trials could comprise most, or in some cases, all of the clinical data package for approval in the new region, so long as they are carried out according to the requirements of the new region. Acceptance in the new region of such foreign clinical data may be achieved by generating “bridging” data in order to extrapolate the safety and efficacy data from the population in the foreign region(s) to the population in the new region.

GLOSSARY

Term	Content
Adequate and Well-controlled Trial	An adequate and well controlled trial has the following characteristics: a design that permits a valid comparison with a control to provide a quantitative assessment of treatment effect; the use of methods to minimize bias in the allocation of patients to treatment groups and in the measurement and assessment of response to treatment; and an analysis of the study results appropriate to the design to assess the effects of the treatment.
Bridging Data Package	Selected information from the Complete Clinical Data Package that is relevant to the population of the new region, including pharmacokinetic data, and any preliminary pharmacodynamic and dose-response data and, if needed, supplemental data obtained from a bridging study in the new region that will allow extrapolation of the foreign safety and efficacy data to the population of the new region.
Bridging Study	A bridging study is defined as a supplemental study performed in the new region to provide pharmacodynamic or clinical data on efficacy, safety, dosage and dose regimen in the new region that will allow extrapolation of the foreign clinical data to the new region. Such studies could include additional pharmacokinetic information.
Complete Clinical Data Package	A clinical data package intended for registration containing clinical data that fulfil the regulatory requirements of the new region and containing pharmacokinetic data relevant to the population in the new region.
Compounds Insensitive to Ethnic Factors	A compound whose characteristics suggest minimal potential for clinically significant impact by ethnic factors on safety, efficacy, or dose response.
Compounds Sensitive to Ethnic Factors	A compound whose pharmacokinetic, pharmacodynamic, or other characteristics suggest the potential for clinically significant impact by intrinsic and/or extrinsic ethnic factors on safety, efficacy, or dose response.
Dosage	The quantity of a medicine given per administration, or per day.
Dose Regimen	The route, frequency and duration of administration of the dose of a medicine over a period of time.
Ethnic Factors	The word ethnicity is derived from the Greek word “ethnos”, meaning nation or people. Ethnic factors are factors relating to races or large populations grouped according to common traits and customs. Note that this definition gives ethnicity, by virtue of its cultural as well as genetic implications, a broader meaning than racial. Ethnic factors may be classified as either intrinsic or extrinsic. (Appendix A) Extrinsic Ethnic Factors: Extrinsic ethnic factors are factors associated with the environment and culture in which a person resides. Extrinsic factors tend to be less genetically and more culturally and behaviourally determined. Examples of extrinsic factors include the social and cultural aspects of a region such as medical practice, diet, use of tobacco, use of alcohol, exposure to pollution and sunshine, socio-economic status, compliance with prescribed medications, and, particularly important to the reliance on studies from a different region, practices in clinical trial design and conduct. Intrinsic Ethnic Factors: Intrinsic ethnic factors are factors that help to define and identify a sub-population and may influence the ability to extrapolate clinical data between regions. Examples of intrinsic factors include genetic polymorphism, age, gender, height, weight, lean body mass, body composition, and organ dysfunction.
Extrapolation of Foreign Clinical Data	The generalization and application of the safety, efficacy and dose response data generated in a population of a foreign region to the population of the new region.
Foreign Clinical Data	Foreign clinical data is defined as clinical data generated outside of the new region (i.e., in the foreign region).
ICH Regions	European Union, Japan, The United States of America.
New Region	The region where product registration is sought.
Population Representative of the New Region	A population that includes the major racial groups within the new region.
Pharmacokinetic Study	A study of how a medicine is handled by the body, usually involving measurement of blood concentrations of drug and its metabolite(s) (sometimes concentrations in urine or tissues) as a function of time. Pharmacokinetic studies are used to characterize absorption, distribution, metabolism and excretion of a drug, either in blood or in other pertinent locations. When combined with pharmacodynamic measures (a PK/PD study) it can characterize the relation of blood concentrations to the extent and timing of pharmacodynamic effects.
Pharmacodynamic Study	A study of a pharmacological or clinical effect of the medicine in individuals to describe the relation of the effect to dose or drug concentration. A pharmacodynamic effect can be a potentially adverse effect (anticholinergic effect with a tricyclic), a measure of activity thought related to clinical benefit (various measures of beta-blockade, effect on ECG intervals, inhibition of ACE or of angiotensin I or II response), a short term desired effect, often a surrogate endpoint (blood pressure, cholesterol), or the ultimate intended clinical benefit (effects on pain, depression, sudden death).
Population Pharmacokinetic Methods	Population pharmacokinetic methods are a population-based evaluation of measurements of systemic drug concentrations, usually two or more per patient under steady state conditions, from all, or a defined subset of, patients who participate in clinical trials.
Therapeutic Dose Range	The difference between the lowest effective dose and the highest dose that gives further benefit.

APPENDIX A

Classification of intrinsic and extrinsic ethnic factors

APPENDIX B

Assessment of the clinical data package (CDP) for acceptability

APPENDIX C

Pharmacokinetic, Pharmacodynamic, and Dose Response Considerations

Evaluation of the pharmacokinetics and pharmacodynamics, and their comparability, in the three major racial groups most relevant to the ICH regions (Asian, Black, and Caucasian) is critical to the registration of medicines in the ICH regions. Basic pharmacokinetic evaluation should characterize absorption, distribution, metabolism, excretion (ADME), and where appropriate, food-drug and drug-drug interactions.

Adequate pharmacokinetic comparison between populations of the two regions allows rational consideration of what kinds of further pharmacodynamic and clinical studies (bridging studies) are needed in the new region. In contrast to the pharmacokinetics of a medication, where differences between populations may be attributed primarily to intrinsic ethnic factors and are readily identified, the pharmacodynamic response (clinical effectiveness, safety, and dose-response) may be influenced by both intrinsic and extrinsic ethnic factors and this may be difficult to identify except by conducting clinical studies in the new region.

The ICH-E4 document describes various approaches to dose-response evaluation. In general, dose-response (or concentration response) should be evaluated for both pharmacologic effect (where one is considered pertinent) and clinical endpoints in the foreign region. The pharmacologic effect, including dose-response, may also be evaluated in the foreign region in a population representative of the new region. Depending on the situation, data on clinical efficacy and dose-response in the new region may or may not be needed, e.g., if the drug class is familiar and the pharmacologic effect is closely linked to clinical effectiveness and dose-response, these foreign pharmacodynamic data may be a sufficient basis for approval and clinical endpoint and dose-response data may not be needed in the new region. The pharmacodynamic evaluation, and possible clinical evaluation (including dose-response) is important because of the possibility that the response curve may be shifted in a new population. Examples of this are well-documented, e.g., the decreased response in blood pressure of blacks to angiotensin-converting enzyme inhibitors.

APPENDIX D

A Medicine’s Sensitivity to Ethnic Factors

Characterization of a medicine according to the potential impact of ethnic factors upon its pharmacokinetics, pharmacodynamics and therapeutic effects may be useful in determining what sort of bridging study is needed in the new region. The impact of ethnic factors upon a medicine’s effect will vary depending upon the drug’s pharmacologic class and indication and the age and gender of the patient. No one property of the medicine is predictive of the compound’s relative sensitivity to ethnic factors. The type of bridging study needed is ultimately a matter of judgement but assessment of sensitivity to ethnic factors may help in that judgement.

The following properties of a compound make it less likely to be sensitive to ethnic factors:

Linear pharmacokinetics (pK)
A flat pharmacodynamic (PD) (effect-concentration) curve for both efficacy and safety in the range of the recommended dosage and dose regimen (this may mean that the medicine is well-tolerated)
A wide therapeutic dose range* (again, possibly an indicator of good tolerability)
Minimal metabolism or metabolism distributed among multiple pathways
High bioavailability, thus less susceptibility to dietary absorption effects
Low potential for protein binding
Little potential for drug-drug, drug-diet and drug-disease interactions
Non-systemic mode of action
Little potential for inappropriate use

The following properties of a compound make it more likely to be sensitive to ethnic factors:

Non-linear pharmacokinetics
A steep pharmacodynamic curve for both efficacy and safety (a small change in dose results in a large change in effect) in the range of the recommended dosage and dose regimen
A narrow therapeutic dose range
Highly metabolized, especially through a single pathway, thereby increasing the potential for drug-drug interaction
Metabolism by enzymes known to show genetic polymorphism
Administration as a prodrug, with the potential for ethnically variable enzymatic conversion
High inter-subject variation in bioavailability
Low bioavailability, thus more susceptible to dietary absorption effects
High likelihood of use in a setting of multiple co-medications
High likelihood for inappropriate use , e.g., analgesics and tranquilizers.

ICH E5

ICH E5(R1) 接受国外临床试验数据的种族因素

English Version

ICH E5(R1) Ethnic Factors in the Acceptability of Foreign Clinical Data

1. 前言

本指南的目的，是推荐一个用于评估种族因素对药物疗效的影响的框架，即某一特定剂量和给药方案对该药的安全性和有效性的影响，从而帮助药品在国际协调会议（ICH）地区注册。本指南为监管和研发策略提供指导，尽可能减少重复临床研究，尽快为患者提供药物使其获益的同时，又对种族因素的影响进行了充分的评估。本指南将与其他ICH 指南一起实施。根据本文件的目的，种族因素被定义为人群中与遗传和生理因素（内因）、以及文化和环境（外因）特征有关的因素。（附录A）.

1.1 目的

描述国外临床试验数据的特征，以便将其外推到不同人群，支持药品在新地区注册。
尽量减少重复的临床研究，及促进新地区接受国外临床试验数据的监管策略。
应用桥接研究，必要时允许将国外临床试验数据外推到新地区。
能够表征种族因素对安全性、有效性、剂量和给药方案的影响的研发策略。

1.2 背景

申请注册时，所有地区都接受符合法规标准及该地区申请注册要求的国外临床试验数据。

但是种族差异可能影响药物的安全性、有效性、剂量和给药方案，使得新地区对国外临床试验数据的接受程度受到限制。这也是以往在新地区提交注册申请时，监管机构要求其在新地区完全或大部分重复国外临床研究和验证的原因之一。虽然不同人群之间的种族差异可能导致药物的安全性、疗效、剂量或给药方案的差异，但很多药物在不同地区的人群之间具有相似的特征和作用。要求对每一个药物进行大量重复的临床研究，可能会延迟新疗法的应用和浪费不必要的药物研发资源。

1.3 范围

本指南的前提是，没有必要在新地区重复进行药物的全部临床研发过程，建议全部或部分的接受国外临床研究数据，以支持药物在新地区的注册审批。首先需要申明的是，本指南并不是为了在新地区申请注册药品而修改对临床数据的要求，而是旨在国外临床研究数据可能符合新地区的注册要求时，推荐接受国外临床资料。临床数据集的所有数据资料，包括国外数据，必须符合新地区的研究设计和实施标准，遵循新地区的监管要求。新地区可要求申办者在该地区进行附加研究以完善临床数据集。

若现有临床数据集符合新地区的管理要求，这些国外数据是否能被接受，还取决于该数据能否外推到新地区的人群。当监管机构或申办者认为种族因素可能改变药物在新地区人群中的安全性或有效性时，申办者可能需要在新地区获得一定的临床数据，以便将两个地区之间的临床数据外推或桥接起来。

如果申办者需要获得额外的临床数据，以满足新地区的监管要求，则可以将这些临床试验设计成桥接研究（bridging study）。

因此，申办者和新地区的监管机构对于该项注册申请需要评估的内容包括：

（1）完全符合新地区监管要求；

（2）将国外临床研究中的部分（大部分或全部）数据应用到新地区的可能性。（附录B）

2. 评估包括国外临床数据的临床数据集，以满足新地区的监管要求

新地区的监管机构将评估包括国外临床数据的临床数据集，以确定该数据的性质和质量是否符合该地区所有的监管标准，而不考虑该数据全部来源于国外、部分来源于国外、还是部分来源于即将申请注册的地区。可通过审批的完整临床数据集的定义为，符合所有地区监管要求的临床数据集。完整临床数据集中的国外临床数据是否被接受，取决于它能否外推到新地区人群中。

在考虑外推之前，递交给新地区的包括国外临床数据的完整临床数据集应当包括以下内容：

国外人群的药动学、药效学、量-效关系、安全性和有效性特点；
确立药物量-效关系、有效性和安全性的临床研究。这些研究包括：

根据新地区监管标准进行设计和实施，例如对照药的选择，且遵照临床试验管理规范（GCP）实施；
研究充分，且具有良好的对照；
采用合适的治疗终点进行评价；
疾病评估时所采用的治疗和诊断的定义能够被新地区所接受。

描述新地区人群的药动学特征，以及（在可能的情况下）药效学特征和以药效学为终点指标的量-效关系特征。应在能够代表新地区人群的国外人群，或在新地区人群中开展研究。

针对临床试验的设计、实施、分析和报告各方面的一系列ICH 指南，将有助于实现完整临床数据集的概念。这些指南包括 GCP（E6）、量-效关系评估（E4）、充分的安全性数据（E1 和E2）、老年用药研究（E7）、研究结果的报告（E3）、一般临床试验的总体考虑（E8）及统计学考虑（E9）。临床试验对照组的选择指导原则（E10）尚待完善。

2.1 根据新地区监管要求的附加临床研究

当国外临床资料不符合新地区的监管要求时，新地区的监管机构可能会要求增加临床试验，例如：

增加在特殊人群中的临床试验：如肾功能不全及肝功能不全的病人等；
按照新地区批准的剂量和给药方法，以不同的对照药进行临床试验；
药物相互作用研究。

3. 评估国外临床试验数据从而外推到新地区

3.1 药物的种族敏感性特征

评估一个药物对种族因素的敏感性，必须了解它的药动学和药效学特征以及应用这些特征解释临床安全性和有效性。附录 C 中描述了一种合理的评估方法。药物的某些特性（化学分类，代谢途径，药理学分类）决定了该药物更容易或不易受到种族因素的影响（附录D）。药物对种族因素不敏感，即不太可能在不同人群中表现出差异，通常会使国外临床数据更容易由一个地区外推到另一个地区中，并且需要较少的桥接数据。

通过比较不同地区的药物效应，药物对种族因素是否敏感将变得更容易理解和评价。然而，显而易见的是，若药物的代谢酶存在基因多态性、量-效曲线陡峭，将更可能存在种族差异。相反，药物缺乏代谢或主动排泄、治疗窗宽、以及量-效曲线平缓，则不易出现种族差异。同类药物在新地区应用的临床经验，也有助于评估药物对种族因素的敏感性。如果在新地区中，相同药理学特征的同类药物，采用与原地区相同的剂量和给药方案进行研究并获批，可能较容易得出药物的药效学和临床行为在国外和新地区也是类似的结论。

3.2 桥接数据集

3.2.1 桥接数据集和桥接研究的定义

一个桥接数据集包括：（1）从完整临床数据集中选出与新地区人群相关的信息，包括药动学数据，药效学和量-效关系数据；（2）有可能需要将国外的有效性和/或安全性数据外推到新地区的桥接研究数据。

桥接研究的定义是，在新地区进行的一项研究，旨在提供新地区的有效性、安全性、剂量和给药方案的药效学或临床数据，从而能够将国外临床数据外推到新地区人群。有效性的桥接研究可以为新地区人群提供额外的药动学信息。当不需要桥接研究提供临床疗效数据时，在新地区进行的药动学研究即可看做是桥接研究。

3.2.2 桥接研究的类型与范围

本指南建议，当新地区监管机构收到符合其监管要求的临床数据集时，应要求递交完整临床数据集中可外推到新地区的必须附加的数据资料集。药物是否存在种族敏感性，决定了这些数据的数量。多数情况下，在新地区进行一个单独的临床试验就能提供这些数据，以满足由原地区外推到新地区的需要，不必再进一步开展重复研究。值得注意的是，虽然有时一个桥接研究就足够用来桥接药物的有效性数据，但实际上申办者可能需要开展更多的研究以获得必要的资料。例如，如果需要采用临床终点，固定剂量的量-效关系研究作为桥接研究，可以用短期的临床药理学终点研究来为较大规模（临床终点）的研究选择给药剂量。

当监管机构要求，或申办者决定实施一个桥接研究，在可能的情况下，鼓励他们通过讨论决定实施何种类型的桥接研究。相对的种族敏感性有助于决定是否进行桥接研究以及进行何种类型的桥接研究。对于没有使用国外数据进行药品注册的经验的地区，即使该药物无种族敏感性，监管机构仍应要求一个桥接研究。当地区间互相接受数据的经验增加，对桥接研究的必要性会有更好的认识。希望随着经验丰富，减少对桥接研究数据的需求。

基于桥接研究数据进行外推的总体指南如下：

如果桥接研究证实在新地区的量-效关系、安全性和有效性与国外相似，则该研究即可说明其能够桥接国外数据。
如果一个实施恰当的桥接研究表明，在新地区不同剂量下的安全性和有效性结果与原地区没有较大差异，通常可将国外数据外推到新地区，也可通过适当的剂量调整（采用药动学和/或药效学数据）将国外数据外推到新地区。
如果桥接研究的规模不能充分描述新地区的不良反应情况，从而将国外数据外推至新地区，则必须增加安全性数据（3.2.4 部分）。
如果桥接研究未能验证药物的安全性和有效性，则需要额外的临床数据（如确证性临床研究）。

3.2.3 有效性的桥接研究

通常对种族因素不敏感的药物，所需桥接研究（如果需要）的类型取决于此类药物的用药经验和外在的种族因素（包括临床试验的设计与实施）对药物安全性、有效性以及量-效关系可能存在的影响。对于种族因素敏感的药物，如果两个地区人群有差异，通常需要桥接研究。以下示例说明了不同情况下应考虑采用的桥接研究类型：

无桥接研究

如果药物对种族不敏感，并且外在因素如医疗措施和临床试验的实施在两地区大致相同。

如果药物对种族敏感，但两地区种族相似，并且药理机制类似的药物有足够的临床经验，可保证该类药物在两地区病人中的安全性、有效性、剂量和给药方案方面相似。这可能是对于给药方式类似的同类药物的情况，同类药物在两个地区应用情况类似，但不一定相同。

临床药理学终点的桥接研究

如果两地区之间有种族差异，而且药物对种族敏感，但外在因素大致相同（如医疗实践，临床试验设计和实施），且该类药物在新地区有临床经验，则在新地区采用反映药物活性的药理学终点（经过验证的替代终点）进行对照的药效学研究，可保证在原地区建立的有效性、安全性、剂量和给药方案适用于新地区。同时药动学（即血药浓度）监测可使这些研究更有说服力。

对照临床试验

在以下情况，通常需要在新地区进行对照试验，常为随机、固定剂量的量-效关系研究：

1. 对剂量的选择有疑问时；

2. 缺乏接受国外对照临床试验数据的经验；

3. 医学实践不同如合并用药不同，临床试验的设计和/或实施不同；

4. 新地区对此类药物不熟悉。

根据这些具体情况，可以重复国外临床研究，或采用标准的临床终点进行短期研究，或采用经过验证的替代终点，如血压或胆固醇（国外三期临床试验中可能已采用更长时间的研究和其他终点）。

如果药效学数据提示地区间疗效有差异，通常有必要在新地区进行一项临床终点的对照试验。药动学的差异并不一定需要进行这样的对照试验，因为在某些情况下，只需调整剂量而不需要进行新的临床试验，但代谢方式存在本质区别时，通常提示需要进行对照临床试验。

当医学实践在合并用药方面存在显著差异，或者辅助治疗可能改变药物的安全性或有效性时，那么桥接研究应为一项对照临床试验。

3.2.4 安全性的桥接研究

即使国外临床数据已表明药物在国外应用的有效性和安全性，有时在新地区仍可能出现需要关注的安全性问题，这包括对新地区常见不良事件发生率的精确估计，以及严重事件的发现（1%的发生率通常需要评估约300 例患者）。依据安全性问题的性质，在下述情况下需要获得安全性资料：

评估有效性的桥接研究（例如量-效关系研究），因能估计常见的不良反应发生率，也可识别新地区更常见的严重不良事件，从而更具有说服力。在新地区对这样的试验进行密切监测，将有助于识别这类严重事件，避免药物暴露到新地区的大量患者当中。或者，可以在桥接研究之前进行一项小规模的安全性研究，以确保严重不良反应不会高频率发生。

如果不需要有效性桥接研究，或有效性桥接研究规模过小，或研究时间过短不足以提供充分的安全性信息，则可能需要进行单独的安全性研究，可见于以下情况：
国外临床数据中有严重不良反应的病例；
新地区与国外报道的不良反应存在差异的；
新地区只有有限的来源于药效桥接研究的安全性数据，不足以外推到安全性的重要方面，例如，常见不良反应发生率或更严重的不良反应。

4. 全球研发策略

在研发早期，药动学、药效学和量-效关系的确定，可能有助于确定进行何种桥接研究及其必要性。全球研发的候选药物可能具有种族敏感或种族不敏感的特征（附录D）。理想情况下，这些研究应在药物研发的早期阶段，例如临床药理学和药效探索研究期间进行。某些情况下，在完成临床资料收集之前，与监管机构讨论桥接研究的设计非常有益。但是，对完整临床数据集中的数据进行分析，将确定进行何种桥接研究及其必要性。为了全球研发，研究应包含即将注册地区的代表性人群，并按照ICH 规范实施。申办者可能希望在药物研发的后期评估新地区相关人群的药动学、药效学、剂量和给药方案。药动学评估可采用正式的药动学研究，或在新地区采用群体药动学方法，或在新地区的相关人群中进行临床试验。

5. 总结

本指南阐述申办者在新地区进行药品注册时，如何处理不同人群中，种族因素可能对药物作用（有效性和安全性）及风险/效益评估的问题。只要国外临床试验数据是按新地区的监管要求获得的，那么其结果可构成大部分，有时甚至是全部的临床药物临床数据集，从而用于药物在新地区的注册。在新地区中接受这样的国外临床数据，可通过桥接研究来实现，以便将国外人群中获得的安全性和有效性数据外推到新地区。

词汇表

术语	含义
充分良好的对照试验	一个充分和良好的对照试验具备以下特征：设计正确的对照作为比较，以提供对治疗效应量化的评估；治疗组的患者分配，以及检测和评估治疗效应时，尽量减少偏倚；按照试验设计，对研究结果进行恰当的分析，以评估治疗效果。
桥接数据集	从完整临床数据集中选出的与新地区人群相关的信息，包括药动学数据和任何原始的药效学数据、以及量-效关系数据。如有必要，从新地区桥接研究中获得的额外数据，可促进国外的有效性和安全性数据外推到新地区人群。
桥接研究	桥接研究的定义是，在新地区进行的补充研究，提供新地区的安全性、有效性、剂量和给药方案的药效学或临床数据。促使国外临床数据外推到新地区。这些研究可能包括补充的药动学数据。
完整的临床数据集	用于注册的、含有符合新地区监管要求的临床数据，包含与新地区人群相关的药动学数据。
对种族因素不敏感的药物	种族因素对其安全性、有效性或量-效关系特征影响很小，无显著临床意义的药物。
对种族因素敏感的药物	药动学、药效学或其他特征表明，其内在和/或外在的种族因素对安全性、有效性或量-效关系可能有显著的、具有临床意义的影响的药物。
剂量	每次或每天的用药数量。
给药方法	在一段时间内，给药剂量、给药途径、频率和给药间隔。
种族因素	“种族”一词，来源于希腊语（ethnos），意思是民族或人民。种族因素是与种族相关，或根据共同特征和习惯聚集的大规模人群。种族的定义包含有民族文化和遗传学的意义，比racial 更广义。种族因素又可分为内在因素和外在因素（附录A）。外在的种族因素外在的种族因素，是与人群居住地环境和文化相关的因素。外在的因素大多由文化和行为所决定，而较少由遗传因素决定。举例来说，外在的因素包括该地区社会与文化的各方面，例如医疗实践、饮食、吸烟、饮酒、大气污染和阳光照射、社会经济状况、服药依从性，尤其重要的是，对其他地区研究数据、临床试验设计和实施情况的接受程度。内在的种族因素内在的种族因素，有助于判断和鉴别亚群的因素，并可能影响临床数据在地区间外推的可能性，例如，基因多态性，年龄，性别，身高，体重，肌肉含量，身体构成和器官功能不全。
国外临床数据的外推	由国外人群中获得的安全性、有效性和量-效关系数据推广应用到新地区人群中。
国外临床数据	在新地区以外（如国外地区）产生的临床数据。
ICH 地区	欧盟，日本，美国、加拿大、瑞士、巴西、中国、新加坡、韩国。
新地区	药物即将申请注册的地区。
新地区的人群代表	在新地区中，包括主要种族群体的人群。
药动学研究	研究药物在体内的处置过程，包括检测血液中药物及其代谢物浓度（有时检测尿和组织中的浓度）随时间的变化情况。药动学是研究药物在血液或其他相关部位的吸收、分布、代谢和排泄特征。当与药效学检测（PK/PD 研究）结合时，能够反映药物浓度与药效学作用的程度与时间的关系。
药效学研究	研究药物在个体中的药理效应或临床疗效，从而描述药物浓度或剂量与药物效应的关系。药物效应有可能是潜在的不良反应（如三环类的抗胆碱作用），对其活性的测定可能与临床获益（如对β 受体阻滞剂的测定，对ECG 间期的影响，ACE 或血管紧张素I/II 的抑制作用）、预期的短期疗效，通常是替代终点（血压、胆固醇），或最终预期的临床获益有关（如对疼痛、抑郁、猝死的影响）。
群体药动学研究	群体的药动学研究，是基于对群体的系统药物浓度测定的评估。通常，从参加临床试验的所有或某个亚群中，每个患者选择两个或以上的稳态样本进行研究。
治疗剂量范围	最低有效剂量与可获得的最大效应的剂量之间的范围。

附录A

内在和外在种族因素的分类

附录B

临床数据集（CDP）可接受性的评估

附录C

对药动学、药效学和量-效关系的考虑

评估ICH 区域最相关的三个主要种族群体（亚洲人、黑种人和白种人）的药动学、药效学以及它们的可比性，对于ICH 地区的药品注册至关重要。基本的药动学评估需阐明药物的吸收、分布、代谢、排泄（ADME）特征，以及合适的药物-食物、药物-药物相互作用。

两个地区之间充分的药动学比较，将有助于在新地区合理选择进一步的药动学研究和临床研究（桥接研究）的种类。人种间药动学的差异可能与内在的种族因素有关，且这种差异易于鉴别。与之不同的是，药效学反应（临床疗效、安全性和量-效关系）可能受到内在、外在种族因素的影响，且很难鉴别，除非在新地区进行临床试验。

ICH-E4 指南阐述了多种剂量-效应关系评估的方法。通常，国外研究应通过药理效应（被认为是相关的）和临床终点来评估剂量-效应（或浓度-效应）关系。也可以在国外能代表新地区的研究中评估药理效应，包括量-效关系。根据不同情况，新地区不一定需要临床疗效和量-效关系相关的数据。例如，当新地区对某类药物较熟悉，且药理效应与临床疗效、量-效关系密切相关时，这些国外药效学数据就足以用于申请注册，那么新地区可能不需要临床终点研究和量-效关系研究的数据。由于新地区的人群中量-效关系曲线可能会迁移，药效学评价和可能的临床终点评价（包括量-效关系）就显得非常重要。这方面已有确切的案例报道，例如黑人的血压对血管紧张素转换酶抑制剂的反应低下。

附录D

对种族因素敏感的药物

根据种族因素对药物的药动学、药效学以及治疗作用的潜在影响而得出的药物特征，有助于决定在新地区选择何种桥接研究。种族因素对药物的影响取决于该药的药理学分类、适应证和患者的年龄与性别。药物的任何特性都不能预测其对种族因素的相对敏感性。桥接研究种类的选择是一个最终决策，而对药物种族敏感性的分析有助于作出决策。

以下所述的特性，提示药物可能对种族因素不敏感：

线性药动学（PK）；
在推荐剂量和给药方案范围内，有效性和安全性均呈平缓的药效学（PD）曲线（浓度-效应）（这意味着药物有较好的耐受性）；
治疗窗宽（可能也是一个耐受性较好的指标）；
较少代谢，或通过多种途径代谢；
生物利用度高，不受饮食吸收作用的影响；
蛋白结合率低；
药物-药物，药物-食物，药物-疾病的相互作用小；
局部起效；
被不恰当使用的几率小。

以下所述的特性，提示药物对种族因素敏感：

非线性药动学；
在推荐剂量和给药方案范围内，有效性和安全性均呈陡峭的药效学曲线（很少的剂量变化引起极大的效应改变）；
治疗窗窄；
代谢率高，特别是通过单一途径代谢，增加了药物-药物相互作用的可能性；
药物代谢酶具有遗传多态性；
前体药物，其转化酶具有潜在的种族差异；
生物利用度的个体间差异大；
生物利用度低，易受饮食吸收作用的影响；
在多种药物联合时，使用率较高；
易被不恰当使用，如镇痛及镇定药。

ICH E5

ICH E5(R1) Implementation Working Group Questions & Answers

中文版

ICH E5(R1) 接受国外临床试验数据的种族因素问答

Date of Approval		Questions	Answers
1	Nov. 2003	I am planning to develop my new drug globally. Does E5 provide guidance for this approach?	E5 does provide some guidance in this situation. E5 addresses primarily how development programs in one or two regions might support approval in another region. E5 says, in general, that if the data developed in one region satisfy the requirements for evidence in a new region, but there is a concern about possible intrinsic or extrinsic ethnic differences between the two regions, then it should be possible to extrapolate the data to the new region with a single bridging study. The bridging study could be a pharmacodynamic study or a full clinical trial, possibly a dose-response study. The bridging study would allow extrapolation of an adequate data base to the new region. It would seem possible, and efficient, to assess potential regional differences as part of a global development program, i.e. for development of data to occur simultaneously in various regions, rather than sequentially. For example, if multi-regional trials had a sufficient number of trial subjects from the new region, it might be possible to analyze the impact of ethnic differences in those studied, to determine whether the entire data base is pertinent to the new region. The basic issues to be considered in a global study design that could affect a region's willingness to rely on these data are: a) definition and diagnoses of disease condition and patient, b) choice of control group, c) regional target or objective of treatment with choice of efficacy variables, d) methods of assessment of safety, e) medical practice, f) duration of the trial, g) regional concomitant medications, h) severity distribution of eligible subjects, and i) similarity of dose and dose regimens. To determine whether your proposed global program will address the requirements of a specific region, it is recommended that early consultation and discussions be held with regulatory authorities in that region.
2	Nov. 2003	I have developed my drug in one region, addressing safety, efficacy, dosing, etc., as well as use in special populations such as patients with renal/hepatic impairment, the elderly, children, and pregnant and lactating women. If I can successfully demonstrate (e.g. through a bridging study) that my safety, efficacy and dosing information in the general population are relevant to the new region, will I also need to further address the extrapolatability of the special population data?	In general, if the studies of special populations are sufficient in design (e.g. include an appropriate range of severity of impairment) to address regulatory requirements of the new region, but are conducted in a foreign region, and if evidence supports the extrapolation of the data in the general population to the new region, you will probably not need to address the issue of special populations again in the new region. Note, however, that for a new indication in a special population (e.g. pediatric depression) a region might require a separate bridging study.
3	Nov. 2003	I believe that my drug is sensitive to ethnic factors and that the medical settings in which it is used may vary among regions. Does this mean that my efficacy study in one region is of no value in support of my application in another?	No. Assuming the new region finds the studies in the first region pertinent, the regulatory authority of the new region will likely require a controlled study in its own region to establish efficacy (and/or to address other issues). E5 indicates, however, that the second region would be likely to consider a single such study adequate if the data from the foreign region otherwise meet all the requirements of the new region. If the new study supports the same conclusions as the study(ies) in the original region, no further confirmation should be needed, as the data from the original region would likely be considered to confirm the finding in the new region. In that case, the study in the new region need not necessarily have the identical dose and treatment effect size to confirm the findings from the initial region. There might also be situations in which the region would consider further safety data necessary. For example, if the new region considered a higher dose or more frequent dosing necessary and if this finding were not a pharmacokinetic effect, sponsors might need to provide additional safety data.
4	Nov. 2003	I believe that my drug is insensitive to ethnic factors and that there are no significant relevant differences in extrinsic factors, including the practice of medicine, among the regions. The pharmacokinetics of the drug are insensitive to intrinsic and extrinsic factors. The diagnosis and therapy of the conditions in the indication do not significantly vary among regions. Nonetheless, the regulatory authority of the new region is requiring an additional study of safety and efficacy for bridging. Is this requirement inconsistent with E5?	No, although you might want to discuss the issue with the regulatory authorities in the new region. E5 makes it clear that the need for a bridging study is always a matter of judgment and does not seek to discourage the new region’s asking for one. E5 specifically notes that familiarity with the other region is likely to be an important determinant of whether the new region asks for a bridging study. E5 does indicate the expectation that the regulatory authorities of new regions would request only those additional data necessary to assess the ability to extrapolate foreign data to the new region, but the amount of additional data called for is a matter of judgement on the part of the regulatory authority.
5	Nov. 2003	My drug has been approved in two ICH regions and I am about to meet with regulatory authorities in the third region to discuss an application for marketing. I believe that the new regulatory authority should accept the present data, and that regulatory authority should require little or no additional data. What information should I submit to support my case that additional data are not needed?	There are two distinct issues that need to be considered: 1) the adequacy of the data base and 2) the need for a bridging study. You will need to convince the regulatory authority that the available data are both adequate to meet the new region's requirements and that the data are applicable to the population of the new region. You should therefore indicate how your data address all the regulatory requirements of the new region. Where the choice of control groups, primary endpoints, or other key clinical trial design features are not those known to be considered acceptable to the new region, you should explain how and why they should be considered to meet the regulatory requirements of the new region. You should also indicate why the data and conclusions should be considered relevant to the new population. In doing this, you should identify the intrinsic factors (e.g. racial distribution) that differ between the regions and show that those factors do not substantially affect the drug effect (i.e. demonstrate that the drug is insensitive to any differences in ethnic factors). Data indicating that pharmacologically related compounds have similar effects in the two regions can be quite useful. You should also identify the extrinsic factors (e.g. diagnosis or management of the patient population studied) that you believe are generally similar to those in the intended population in the new region and explain why any significant differences would not alter conclusions to be drawn about the drug effect. Dose-response relationships should be evaluated to determine if these are sensitive to intrinsic or extrinsic factors, and whether the appropriate doses might vary markedly among individuals or ethnic groups.
6	Nov. 2003	I believe that my drug is insensitive to ethnic factors and that drugs in its class have similar activity in all regions. However, the endpoints I studied and/or the control group I used were considered acceptable to the regions in which the studies were conducted but not to the new region. Does E5 indicate that the new region should accept those data as evidence of efficacy?	No. E5 indicates clearly that it applies only when the foreign clinical data address all the regulatory requirements of the new region, but come from a different region. E5 does not address the regulatory requirements of individual regions. If your choice of clinical endpoints or control group is not considered acceptable to the new region, and if you cannot convince regulators in that region otherwise, then E5 does not apply to this situation. Early discussion with regulators in regions where endpoints, control groups, inclusion criteria or diagnostic criteria might differ should be considered part of planning clinical studies to meet an individual region’s requirements. In this situation, the regulatory authority in the new region may require you to conduct a study using agreed-upon criteria in the new region.
7	Nov. 2003	I believe my drug is insensitive to ethnic factors. However, there is a clear difference in medical practice and the use and perceived need for certain drugs in the targeted therapeutic area. Does E5 indicate that the new region should accept those data as evidence of efficacy?	No. As described, the data base might not be acceptable to the new region, apart from concerns about ethnic differences, because the data do not refer to a disease that the new region considers pertinent.
8	Nov. 2003	My drug has been shown to be effective in preventing certain clinical events. However, the rate of these events is clearly different in the new region, even though the pathophysiology is the same. Does E5 indicate that the new region should accept those data as pivotal evidence of efficacy?	No. Certainly, in most cases where there is a definitive outcome study in another region, a region would probably not require that the study be repeated locally. There could, however, be exceptions; for example, if the event rate is indeed lower in the new region, and the risk reduction is the same in both regions, the actual number of patients benefited will be smaller and an adverse effect could become more important, affecting the benefit to risk relationship of the drug. A new region, in some cases, might need a clinical trial to assess the value of the drug.
9	Nov. 2003	My drug is approved for various indications in one region and it is shown in a bridging study in the primary indication that the data can be extrapolated. Does this mean that the new regions should accept all indications without further data?	No. Whether or not the new region will require further data would be decided on a case-by-case basis, depending on whether the "bridged" indication was thought to satisfy all concerns about potential ethnic differences. For example, the additional indications might be extensions of the primary indication (perhaps not calling for an additional bridging study) or quite new uses (perhaps calling for bridging). It is recommended that early consultation and discussions be held with the authorities in the new region.
10	Nov. 2003	E5 expresses the principle that, as experience with interregional acceptance of foreign clinical data increases, there will be a better understanding of situations in which bridging studies are needed and that it is hoped that, with these experiences, the need for bridging data will lessen. Is this principle still valid?	Yes, this is the expectation. The accumulation of experience by each region with implementation of the E5 guidance continues to add to our understanding of situations in which a bridging study would be considered necessary by a new region. The expectation continues to be that, with this experience, the need for a bridging study will lessen.
11	June 2006	There seems to be an impression that the E5 bridging study would always be conducted after data in the original region is complete. Is this correct? It may be desirable in certain situations to achieve the goal of bridging by conducting a multi-regional trial under a common protocol that includes sufficient numbers of patients from each of multiple regions to reach a conclusion about the effect of the drug in all regions. Please provide points to consider in designing, analyzing and evaluating such a multi-regional trial.	Bridging data should allow for extrapolation of data from one region to another. Although E5 speaks generally to extrapolation of data to a new region, E5 was not intended to suggest that the bridging study should necessarily follow development in another region. In the answer to Q1, it is made clear that it is also possible to include earlier studies conducted in several regions in a global drug development program so that bridging data might become available sooner. This can expedite completion of a global clinical development program and facilitate registration in all regions. A bridging study therefore can be done at the beginning, during or at the end of a global development program. For a multi-regional trial to serve as a bridging study for a particular region, it would need to have persuasive results in that region, because it is these regional results that can convince the regulators in that region that the drug is effective, and can "bridge" the results of trials in other regions in the registration application. A multi-regional trial for the purpose of bridging could be conducted in the context of a global development program designed for near simultaneous world-wide registration. The objectives of such a study would be: 1) to show that the drug is effective in the region and 2) to compare the results of the study between the regions with the intent of establishing that the drug is not sensitive to ethnic factors. The primary endpoint(s) of the study should be defined and acceptable to the individual regions and data on all primary endpoints should be collected in all regions under a common protocol. In instances where the primary endpoints to be used by the regions are different, data for comparison purposes on all primary endpoints should be collected in all regions. For a study intended to serve as a bridging study, the following points should be considered: Planning The multi-regional trial would have to satisfy requirements of the region where the application is to be filed with respect to design and analysis (see answer to Q1). In general, a multi-regional study should be designed with sufficient numbers of subjects so that there is adequate power to have a reasonable likelihood of showing an effect in each region of interest. Minor differences in design (e.g., age inclusion criteria, concomitant medication, etc.) may be acceptable and prior discussion with regulatory agencies is encouraged. For safety evaluation, it is important to make as uniform as possible the method for collection and assessment of safety information among regions. Analysis Given the goal of the multi-regional bridging study, it is critical to provide efficacy and safety results by region, with attention given to the usual analyses (e.g., demographic and baseline variables, patient disposition). It will be of interest also to examine consistency of effects across regions. In a dose response study, it will be especially important to analyze dose response relationships for efficacy and safety both within the regions and across the regions. Evaluation It is difficult to generalize about what study results would be judged persuasive, as this is clearly a regional determination, but a “hierarchy of persuasiveness” can be described. 1. Stand Alone Regional Result The most persuasive would be demonstration of the effect in the entire study, with the results of each region of interest also demonstrating a statistically significant result. It will also be important to compare results across regions. 2. No Significant Regional Result but Similar Results across Regions With an effect demonstrated in the entire study, an analysis of results by region might not show a significant result in a region of interest but the data might nonetheless be persuasive to regulators in that region. Consistent trends in endpoint(s) intended for comparison across the regions or, in the case of a dose-response study, similar dose-response relationships across regions, might support an argument that the drug is not sensitive to intrinsic or extrinsic ethnic factors. Other data, for example, from approved drugs in the same class within region(s) could support such a bridging conclusion. Other consideration This Q & A discusses use of multi-regional studies as bridging studies. There are other possible uses of multi-regional studies. For example, at an early stage of development, such studies could compare various endpoints in an exploratory setting in different regions to guide a synchronized global development plan.

ICH E5

ICH E5(R1) 接受国外临床试验数据的种族因素问答

English Version

ICH E5(R1) Implementation Working Group Questions & Answers

批准日期		问题	答复
1	2003年11月	我计划在全球范围内研发一种新药。E5 能提供指导意见吗？	E5 正是为这种情况提供一些指导意见。E5 主要解决在一两个地区进行的研发项目如何在另一个地区获批的问题。E5 指出，通常情况下，如果在一个地区获得的临床数据符合新地区的证据要求，但需考虑到两个地区间可能存在的内在或外在种族差异性时，可以通过增加一个桥接研究使现有临床数据能够外推至新地区人群。桥接研究可以是药效学研究或完整的临床试验，也可能是剂量-效应关系研究。桥接研究可允许将合适的临床数据集外推至新地区人群。它作为全球研发计划的一部分，即同时而不是依次在多个地区进行，能合理且有效地评估潜在的区域差异性。例如，如果在多个地区的临床研究中有足够数量的受试人群来自于新地区，则有可能通过分析这些研究中种族差异性的影响，来确定整个临床数据集是否与新地区一致。在全球研究设计中需要考虑到的能影响新地区对于数据的接受程度的基本问题包括：a）对病情和患者的确定与诊断；b）对照组的选择；c）区域性的治疗目标或疗效指标的选择；d）安全性评估方法；e）医疗实践；f）临床研究持续时间；g）地区性合并用药；h）纳入受试者的严重程度分布；以及i）剂量和给药方案的相似性。为了确定您所提出的全球试验方案是否能符合特定地区的监管要求，建议与该地区的监管机构尽早协商与讨论。
2	2003年11月	我已在一个地区进行了药物的临床研究，涵盖了安全性、有效性、给药方案等方面以及在特殊人群如肝/肾功能不全患者、老年人、儿童以及妊娠期和哺乳期妇女中的使用。如果我能成功证明该药物在一般人群中的安全性、有效性和给药方案信息能外推至新地区人群（如通过桥接研究），是否还需进一步解决特殊人群数据外推的问题？	通常，如果特殊人群研究在试验设计上足以满足新地区的监管要求（如包括适当范围的不同损伤程度的受试人群），但是试验是在新地区以外的地区进行的，如果有证据支持一般人群研究数据可以外推至新地区，那么可能不需要重新研究新地区的特殊人群问题。但需注意，对于特殊人群中的新适应症（如儿童抑郁症），新地区可能会要求进行单独的桥接研究。
3	2003年11月	我认为我的药物对种族因素敏感，且在不同地区其应用的医疗环境也有所差异。这是否意味着在某一地区进行的有效性研究将无法支持在另一地区进行药品申报？	不是。假设发现新地区认为其研究结果与原地区具有相关性，则新地区的监管机构可能会要求在本地区进行对照临床研究，以确定药物的有效性（和/或解决其它问题）。但是，E5 指出，如果国外地区临床数据的其它方面均能符合新地区的所有要求，新地区可能会考虑仅进行一个单独的上述研究。如果新研究得出的结论与原地区相同，则原地区的临床数据被认为适用于新地区人群，因此无需进一步证实。在这种情况下，新地区的临床研究不一定需要用相同的剂量和治疗效应来证实之前在原地区的研究结果。在某些情况下，新地区可能会考虑要求额外的安全性数据。例如，如果新地区认为有必要使用更高的剂量或更高的给药频率，以及如果上述研究不是为了获得更多药代动力学结果，则申办者可能需要提供额外的安全性数据。
4	2003年11月	我相信我的药物对种族因素不敏感，且在各地区间包括医疗实践等相关外在因素的差异没有意义。该药物的药代动力学对内在和外在因素也不敏感。在各地区间适应症的诊断和治疗也无显著性差异。但是，新地区管理局仍要求提供额外的安全性和有效性桥接研究。该要求是否与 E5 相矛盾？	不是，但您也可以与新地区监管机构就此问题进行商讨。E5 明确指出，对桥接研究的要求始终只是一个判断性问题，并不能阻止新地区要求您进行桥接研究。E5 特别指出，对其它地区的熟悉程度可能是决定新地区是否要求进行桥接研究的重要因素。E5 期望新地区监管机构仅要求补充其它必需数据，以评估国外临床数据是否能外推到新地区人群，但是要求额外数据的数量则由监管机构自行议定。
5	2003年11月	我的药物已经在两个ICH 地区获得批准。我将与第三个地区的监管机构进行会谈，以讨论上市药品注册申报问题。我相信新的监管机构应该会接受现有临床数据，且可能仅要求提供少量额外数据或无需提供。我应当提供哪些信息来保证无需提供其它额外数据呢？	有两个明确的问题需要考虑：1）数据集的充分性和2）桥接研究的必要性。您需要使监管机构确信已获得的数据足以充分符合新地区的监管要求，且适用于新地区人群。因此，您应证明您的研究数据如何满足新地区所有的监管要求。如果新地区认为不能接受对照组、主要终点或其他关键临床试验设计特征，您应该解释这些问题该如何考虑，以及为什么能够满足新地区的监管要求。您还应说明为何临床数据和结果可以外推至新地区人群。在处理这个问题时，您应确定不同地区间存在差异的内在因素（如人种分布），并指出这些因素不会对药效造成显著性影响（即证实药物对所有种族因素不敏感）。用数据证明作用机制类似的化合物在两个地区的药理作用相似是非常重要的。您还应确定您认为与新地区的目标人群基本相似的外在因素（例如所研究的患者人群的诊断和处理），并解释为何所有显著性差异均不会对药物疗效的相关结论造成影响。应当评价量效关系以判定其是否对内在或外在因素敏感，以及一个合适的给药剂量是否在不同个体或种族中发生明显的变化。
6	2003年11月	我相信我的药物对种族因素不敏感，且同类药物在所有地区均具有相似的药理活性。但是，某些地区临床研究中所选用且被接受的研究终点和/或对照组在新地区却未被认可。E5 中是否有指出新地区应当接受上述有效性临床数据？	不是。E5 明确指出，只有当国外临床数据（来自不同地区）能满足新地区的所有监管要求时，上述情况才适用。E5 不会解决单个地区的监管要求。如果您选择的临床终点或对照组不被新地区所接受，且无法说服该地区的监管者，则E5 将不适用。在早期与该地区监管者就研究终点、对照组、入选标准或诊断标准可能存在的差异进行讨论，应被认为是临床研究计划的一部分，以满足新地区监管要求。在该情况下，新地区监管者可以要求您使用经同意后的统一标准在新地区进行临床研究。
7	2003年11月	我相信我的药物对种族因素不敏感。但是，在目标治疗领域，医疗实践和对某些药物的使用与主观需求方面存在明显差异。E5 是否有指出新地区应接受现有数据作为有效性的证据？	不是。如上所述，除了对种族差异的担忧之外，现有的数据不能适用于新地区对相关疾病的考虑，因此新地区可能不会接受现有的数据。
8	2003年11月	我的药物已经被证实能有效预防某些临床事件的发生。但是，即使病理学结果相同，这些事件的发生率在新地区有明显不同。E5 是否有指出新地区应接受现有数据作为有效性的关键证据？	不是。可以肯定的是，在大多数情况下，在一个地区获得确定结果的研究，在另一地区则可能无需再重复。但也有例外。例如，如果某一事件的发生率在新地区确实较低，且两个地区的风险降低程度相同，那么新地区的受益患者实际人数将会有所减少，且不良反应会表现得更加重要，进而影响到药物的获益/风险比。某些情况下，新地区需要进行临床研究以评估药物的价值。
9	2003年11月	我的药物已在一个地区被批准用于治疗多种适应症，并且在一项主要适应症的桥接研究中被证实能外推至新地区人群。这是否意味着在不提供额外临床数据的情况下，新地区会批准所有适应症的治疗。	不是。新地区是否要求提供额外的临床数据取决于具体情况，依赖于“桥接”适应症是否满足所有潜在种族差异性相关问题。例如，附加的适应症或许是主要适应症范围延伸（可能不要求补充进行桥接研究）或全新的使用范围（可能需要进行桥接研究）。建议早期与新地区监管者进行协商和讨论。
10	2003年11月	E5 曾表述过这样的原则，即随着对地区间接受国外临床数据经验的增加，对于哪些情况下需要进行桥接研究有了更好的理解，因此可以期望随着这些经验的增加，地区间对桥接研究的需求也会逐渐减少。该原则是否仍有效？	是，这是我们所期望的。随着每个地区执行E5 指导原则积累经验的增加，将会继续增加新地区对是否有必要进行桥接研究的理解。随着上述经验的增加，可以期望地区间对桥接研究的需求也会逐渐减少。
11	2006年11月	印象中E5 桥接研究通常是在原地区临床数据完成后实施。这是否正确？在某些情况下想要达到桥接研究的目的，可以通过在相同研究方案下进行多地区临床研究，需要保证每一个地区均涵涵足够数量的患者，从而得出药物在所有地区的有效性结论。请对多地区临床研究的设计、分析和评价方面的考虑提供指导性意见。	通过桥接研究，可以允许将研究数据从一个地区外推至另一个地区。尽管E5 指出研究数据可以外推至新地区，但这并不表示另一个地区的桥接研究必须在整体研发之后进行。Q1 答复中明确指出，为了尽早获得桥接数据，全球药物研发计划可以在多个地区同时进行早期研究。这样可以加速完成全球临床研发计划，并有助于药品在所有地区的注册申报。因此，桥接研究可以在全球研发计划开始时、进行中或结束时开展。以多地区临床试验来作为特定地区的桥接研究，需要获得该地区有说服力的研究结果。这是因为该地区研究结果可以使该地区监管者确认药物的有效性，且可以“桥接”注册申报中其他地区的研究结果。出于桥接目的的多地区临床研究，可以在计划全球同步注册的研发方案中实施。该研究目的是：1）证实药物在该地区的有效性和2）比较不同地区的研究结果以确定药物对种族因素不敏感。应确定主要终点并被各地区所认可，在相同研究方案的条件下收集所有地区的所有主要终点数据。如果不同地区所采用的主要终点有差异，则应收集所有地区的用于比较所有主要终点的数据。对于一个预期可以作为桥接研究的试验，应考虑以下几点：计划多地区试验需满足申报地区在研究设计和分析方面的监管要求（见Q1答复）。通常，多地区研究应包括足够数量的受试者以便具有足够的把握度来合理地证明药物在每个申报地区的有效性。研究设计上的细微差异（如年龄入选标准、合并用药等）是可以接受的，鼓励与监管机构在前期讨论这些问题。对于安全性评价，尽可能就不同地区间安全性信息的收集和评价方法达成一致是十分重要的。分析对于以桥接为目的的多地区研究，提供各个地区有效性和安全性结果是至关重要的，并应关注常规分析结果（如人口统计学和基线指标、患者分布）。检查不同地区间疗效的一致性也是非常重要的。在量效关系研究中，从有效性和安全性方面对地区内和不同地区间均进行量效关系分析尤为关键。评价由于分析检测有明显地区域性，因此很难概括出怎样的研究结果会被判定为可信的，但可以通过“说服力级别”进行描述。 1. 立足于独立的区域性结果最具有说服力的结果是在整个研究中证实药物的疗效，同时各个地区也能证实研究结果具有统计学意义。此外，比较不同地区间结果也非常重要。 2.区域性结果无显著性，但各个地区间结果相似虽然整个研究证实了药物的疗效，但按照区域分析可能显示某特定地区未获得显著性结果，但临床数据仍可能说服该地区监管者。如果各地区间结果比较，其终点有一致性趋势，或在量效关系研究中各个地区表现出相似的量效关系，则可以支持该药物对内在或外在因素不敏感的结论。其他数据，例如在地区内（间）已经批准了同类药物，也可以用于支持桥接研究结论。其他考虑事项本问答对使用多地区研究作为桥接研究的问题进行了讨论。多地区研究还具有其他用途。例如，在研发的早期，这种多地区研究可以对在不同地区的在探索性条件下对各个研究终点进行比较，以便为全球同步研发计划提供指导意见。

ICH E6

ICH E7

ICH E8

ICH E8(R1) General Considerations for Clinical Studies

中文版

ICH E8(R1) 临床试验的一般考虑

1. OBJECTIVES OF THIS DOCUMENT

Clinical studies of medicinal products are conducted to provide information that can ultimately improve access to safe and effective products with meaningful impact on patients, while protecting those participating in the studies. This document provides guidance on the clinical development lifecycle, including designing quality into clinical studies, considering the broad range of clinical study designs and data sources used.

The ICH document "General Considerations for Clinical Studies" is intended to:

1. Describe internationally accepted principles and practices in the design and conduct of clinical studies that will ensure the protection of study participants and facilitate acceptance of data and results by regulatory authorities

2. Provide guidance on the consideration of quality in the design and conduct of clinical studies across the product lifecycle, including the identification, during study planning, of factors that are critical to the quality of the study, and the management of risks to those factors during study conduct

3. Provide an overview of the types of clinical studies performed during the product lifecycle, and describe study design elements that support the identification of quality factors critical to ensuring the protection of study participants, the integrity of the data, the reliability of results, and the ability of the studies to meet their objectives

4. Provide a guide to the ICH efficacy documents to facilitate user's access

General principles are described in Section 2 of this document, followed by a discussion of designing quality into clinical studies in Section 3. A broad overview of drug development planning and the information provided by different types of studies needed to progress development through the lifecycle of the product is given in Section 4. In Section 5, important elements of clinical study design are described that reflect the variety of designs used in drug development as well as the range of data sources available. Section 6 addresses study conduct, ensuring the safety of study participants, and study reporting. Some considerations for identifying factors that are critical to the quality of a study are provided in Section 7.

The ICH Efficacy guidelines are an integrated set of guidance covering the planning, design, conduct, safety, analysis, and reporting of clinical studies. ICH E8 provides an overall introduction to clinical development, designing quality into clinical studies and focusing on those factors critical to the quality of the studies. The guidelines should be considered and used in an integrated, holistic way rather than focusing on only one guideline or subsection.

For the purposes of this document, a clinical study is meant to refer to a study of one or more medicinal products in humans, conducted at any point in a product’s lifecycle, both prior to and following marketing authorisation. The focus is on clinical studies to support regulatory decisions, recognizing these studies may also inform health policy decisions, clinical practice guidelines, or other actions. The term "drug" should be considered synonymous with therapeutic, preventative, or diagnostic medicinal products. The term “drug approval” refers to obtaining marketing authorisation for the drug.

2. GENERAL PRINCIPLES

2.1 Protection of Clinical Study Participants

Important principles of ethical conduct of clinical studies and the protection of participants, including special populations, have their origins in the Declaration of Helsinki and should be observed in the conduct of all human clinical investigations. These principles are stated in other ICH guidelines, in particular, ICH E6-Good Clinical Practice.

As further described in the E6 guideline, the investigator and sponsor have responsibilities for the protection of study participants together with the Institutional Review Board/Independent Ethics Committee.

The confidentiality of information that could identify participants should be protected in accordance with the applicable regulatory and legal requirement(s).

Before initiating a clinical study, sufficient information should be available to ensure that the drug is acceptably safe for the planned study in humans. Emerging non-clinical, clinical, and pharmaceutical quality data should be reviewed and evaluated, as they become available, by qualified experts to assess the potential implications for the safety of study participants. Ongoing and future studies should be appropriately adjusted as needed, to take new knowledge into consideration and to protect study participants. Throughout drug development, care should be taken to ensure all study procedures and assessments are necessary from a scientific viewpoint and do not place undue burden on study participants.

2.2 Scientific Approach in Clinical Study Design, Planning, Conduct, Analysis, and Reporting

The essence of clinical research is to ask important questions and answer them with appropriate studies. The primary objectives of any study should reflect the research questions and be clear and explicitly stated. Clinical studies should be designed, planned, conducted, analysed, and reported according to sound scientific principles to achieve their objectives.

Quality of a clinical study is considered in this document as fitness for purpose. The purpose of a clinical study is to generate reliable information to answer the research questions and support decision making while protecting study participants. The quality of the information generated should therefore be sufficient to support good decision making.

Quality by design in clinical research sets out to ensure that the quality of a study is driven proactively by designing quality into the study protocol and processes. This involves the use of a prospective, multidisciplinary approach to promote the quality of protocol and process design in a manner proportionate to the risks involved, and clear communication of how this will be achieved.

Across the product lifecycle, different types of studies will be conducted with different objectives and designs and may involve different data sources. For purposes of this guideline, development planning is considered to cover the entire product lifecycle (Section 4). The Annex provides a broad categorisation of study type by objective within the different stages of drug development. Studies should be rigorously designed to address the study objectives with careful attention to the design elements, such as the choice of study population and response variables and the use of methods to minimize biases in the findings (Section 5).

The cardinal logic behind serially conducted studies is that the results of prior studies should inform the plan of later studies. Emerging data will frequently prompt a modification of the development strategy. For example, results of a confirmatory study may suggest a need for additional human pharmacology studies.

The availability of multi-regional data as a result of the increased globalisation of drug development programmes, facilitated by the harmonisation of ICH Guidelines, minimises the need to conduct individual studies in different regions. The results of a study are often used in regulatory submissions in multiple regions, and the design should also consider the relevance of the study results for regions other than the one(s) in which the study is conducted. Further guidance is provided by ICH E5 Ethnic Factors, ICH E6, and ICH E17 Multi-Regional Clinical Trials.

Early engagement with regulatory authorities to understand local/regional requirements and expectations is encouraged and will facilitate the ability to design quality into the study.

2.3 Patient Input into Drug Development

Consulting with patients and/or patient organisations during drug development can help to ensure that patients’ perspectives are captured. The views of patients (or of their Caregivers/parents) can be valuable throughout all phases of drug development. Involving patients early in the design of a study is likely to increase trust in the study, facilitate recruitment, and promote adherence. Patients also provide their perspective of living with a condition, which may contribute to the determination, for example, of endpoints that are meaningful to patients, selection of the appropriate population and duration of the study, and use of acceptable comparators. This ultimately supports the development of drugs that are better tailored to patients’ needs.

3. DESIGNING QUALITY INTO CLINICAL STUDIES

The quality by design approach to clinical research (Section 3.1) involves focusing on critical to quality factors to ensure the protection of the rights, safety, and wellbeing of study participants, the generation of reliable and meaningful results, and the management of risks to those factors using a risk-proportionate approach (Section 3.2). The approach is supported by the establishment of an appropriate framework for the identification and review of critical to quality factors (Section 3.3) at the time of design and planning of the study, and throughout its conduct, analysis, and reporting.

3.1 Quality by Design of Clinical Studies

Good planning and implementation of a clinical study also derive from attention to the design elements of clinical studies as described in Section 5, such as:

the need for clear pre-defined study objectives that address the primary scientific question(s);
selection of appropriate participants that have the disease, condition, or molecular/genetic profile that is being studied;
use of approaches to minimise bias, such as randomisation, blinding or masking, and/or control of confounding;
endpoints that are well-defined, measurable, clinically meaningful, and relevant to patients.

Operational criteria are also important, such as ensuring a clear understanding of the feasibility of the study, selection of suitable investigator sites, quality of specialised analytical and testing facilities and procedures, and processes that ensure data integrity.

3.2 Critical to Quality Factors

A basic set of factors relevant to ensuring study quality should be identified for each study. Emphasis should be given to those factors that stand out as critical to study quality. These critical to quality factors are attributes of a study whose integrity is fundamental to the protection of study participants, the reliability and interpretability of the study results, and the decisions made based on the study results. These quality factors are considered to be critical because, if their integrity were to be undermined by errors of design or conduct, the reliability or ethics of decision-making based on the results of the study would also be undermined. Critical to quality factors should also be considered holistically, so that dependencies among them can be identified. Section 7 of this document provides considerations that can help identify critical to quality factors for a study.

The design of a clinical study should reflect the state of knowledge and experience with the drug; the condition to be treated, diagnosed or prevented; the underlying biological mechanism (of both the condition and the treatment); and the population for which the drug is intended. As research progresses, knowledge increases and uncertainties about the pharmacology, safety and efficacy of a drug decrease. Knowledge of the drug at any point in development will continually inform the identification of critical to quality factors and control processes used to manage them.

Proactive communication of the critical to quality factors and risk mitigation activities will support understanding of priorities and resource allocation by the sponsor and investigator sites. Proactive support (e.g., training to site staff, relevant to their role, and description of critical to quality factors and potential mitigation measures in the protocol) will enhance correct implementation of study protocol, procedures, and associated operational plans and process design.

Perfection in every aspect of an activity is rarely achievable or can only be achieved by use of resources that are out of proportion to the benefit obtained. The quality factors should be prioritised to identify those that are critical to the study, at the time of the study design, and study procedures should be proportionate to the risks inherent in the study and the importance of the information collected. The critical to quality factors should be clear and should not be cluttered with minor issues (e.g., due to extensive secondary objectives or processes/data collection not linked to the proper protection of the study participants and/or primary study objectives).

3.3 Approach to Identifying the Critical to Quality Factors

A key aspect of a quality approach to study design is to ask whether the objectives being addressed by the study are clearly articulated; whether the study is designed to meet the research question it sets out to address; whether these questions are meaningful to patients; and whether the study hypotheses are specific and scientifically valid. The approach to the identification of the critical to quality factors should consider whether those objectives can be met, well and most efficiently, by the chosen design and data sources. Patient consultation early in the study design process can contribute to this approach and ultimately help to identify the critical to quality factors. Study designs should be operationally feasible and avoid unnecessary complexity. Protocols and case report forms/data collection methods should enable the study to be conducted as designed and avoid unnecessary data collection.

Identification of critical to quality factors will be enhanced by approaches that include the following elements:

3.3.1 Establishing a Culture that Supports Open Dialogue

Creating a culture that values and rewards critical thinking and open, proactive dialogue about what is critical to quality for a particular study or development programme, going beyond sole reliance on tools and checklists, is encouraged. Open dialogue can facilitate the development of innovative methods for ensuring quality.

Inflexible, “one size fits all” approaches should be discouraged. Standardised operating procedures are necessary and beneficial for conducting good quality clinical studies, but study specific strategies and actions are also needed to effectively and efficiently support quality in a study.

Evidence used to inform the study design should be gathered and reviewed, before and during the study, in a transparent manner, while acknowledging gaps in data and conflicting data, where present and known, and anticipating the possible emergence of such gaps or conflicts.

3.3.2 Focusing on Activities Essential to the Study

Efforts should be focused on activities that are essential to the reliability and meaningfulness of study outcomes for patients and public health, and the safe, ethical conduct of the study for participants. Consideration should be given to eliminating nonessential activities and data collection from the study to increase quality by simplifying conduct, improving study efficiency, and targeting resources to critical areas. Resources should be deployed to identify and prevent or control errors that matter.

3.3.3 Engaging Stakeholders in Study Design

Clinical study design is best informed by input from a broad range of stakeholders, including patients and healthcare providers. It should be open to challenge by subject matter experts and stakeholders from outside, as well as within, the sponsor organisation.

The process of building quality into the study may be informed by participation of those directly involved in successful completion of the study such as clinical investigators, study coordinators and other site staff, and patients/patient organisations. Clinical investigators and potential study participants have valuable insights into the feasibility of enrolling participants who meet proposed eligibility criteria, whether scheduled study visits and procedures may be overly burdensome and lead to early dropouts, and the general relevance of study endpoints and study settings to the targeted patient population. They may also provide insight into the value of a treatment in the context of ethical issues, culture, region, demographics, and other characteristics of subgroups within a targeted patient population.

Early engagement with regulatory authorities is encouraged, particularly when a study has novel elements considered critical to quality (e.g., defining patient populations, procedures, or endpoints).

3.3.4 Reviewing Critical to Quality Factors

Accumulated experience and knowledge, together with periodic review of critical to quality factors should be used to determine whether adjustments to risk control mechanisms are needed, because new or unanticipated issues may arise once the study has begun.

Studies with adaptive features and/or interim decision points need specific attention during proactive planning and ongoing review of critical to quality factors, and risk management (ICH E9 Statistical Principles for Clinical Trials).

3.3.5 Critical to Quality Factors in Operational Practice

The foundation of a successful study is a protocol that is both scientifically sound and operationally feasible. A feasibility assessment involves consideration of study design and implementation elements that could impact the successful completion of clinical development from an operational perspective.

Feasibility considerations also include but are not limited to regional differences in medical practice and patient populations, the availability of qualified investigators/site personnel with experience in conducting a clinical study (ICH E6), availability of equipment and facilities required to successfully conduct the study, availability of the targeted patient population, and ability to enrol a sufficient number of participants to meet the study objectives. The retention and follow up of study participants are also key critical to quality factors. Consideration of these and other critical to quality factors relating to study feasibility can inform study design and enhance quality implementation.

4. DRUG DEVELOPMENT PLANNING

This section provides general principles to consider in drug development planning. Drug development planning adheres to the principles of scientific research and good study design that ensure the reliability and interpretability of results. Efficient drug development includes appropriately planned interactions with regulatory authorities throughout development to ensure alignment with requirements for product quality and to support approval in the condition or disease, including possible post-approval studies to address remaining questions. Throughout this process there is critical attention to the protection of the rights, safety and wellbeing of study participants.

Drug development planning builds on knowledge acquired throughout the investigational process to reduce levels of uncertainty as the process moves from target identification through non-clinical and clinical evaluation. Such planning encompasses quality of medicinal product, including chemistry, manufacturing and controls (CMC), and non-clinical and clinical studies (pre and post-approval). Modelling and simulation may inform drug development throughout the process. Planning may also include regional considerations for product introduction into the market, such as health technology assessments.

It is important to ensure that the experiences, perspectives, needs, and priorities of relevant stakeholders relating to the development and evaluation of the drug throughout its lifecycle are captured and meaningfully incorporated into drug development planning.

Clinical development may also feature requirements for co-development of validated biomarkers, diagnostic testing, or devices that facilitate the safe and effective use of a drug.

The types of studies that may contribute to drug development are described in subsections 4.2 and 4.3 and summarised in the Annex.

4.1 Quality of Investigational Medicinal Product

Ensuring adequate quality and characterisation of physicochemical properties of investigational medicinal product is an important element in planning a drug development programme and is addressed in ICH and regional quality guidelines. More extensive characterisation may be required for complex or biological products. Formulations should be well characterised in the drug development plan, including information on bioavailability, wherever feasible, and should be appropriate for the stage of drug development and the targeted patient population. Age-appropriate formulation development may be a consideration when clinical studies are planned in paediatric populations (ICH E11- E11A Clinical Trials in Pediatric Population).

Evaluation of the quality of a drug may extend to devices required for its administration or a companion diagnostic to identify the targeted population.

Changes in a product during development should be supported by comparability data to ensure the ability to interpret study results across the development programme. This includes establishing links between formulations through bioequivalence studies or other means.

4.2 Non-Clinical Studies

Guidance on non-clinical safety studies is provided in ICH M3 Nonclinical Safety Studies, ICH Safety (S) Guidelines and related Q&A documents, as well as in regional guidance. The nonclinical assessment usually includes toxicology, carcinogenicity, immunogenicity, pharmacology, pharmacokinetics, and other evaluations to support clinical studies (and may encompass evidence generated in in vivo and in vitro models, and by modelling and simulation). The scope of non-clinical studies, and their timing with respect to clinical studies, depend on a variety of factors that inform further development, such as the drug’s chemical or molecular properties; pharmacological basis of principal effects (mechanism of action); route(s) of administration; absorption, distribution, metabolism, and excretion (ADME); physiological effects on organ systems; dose/concentration-response relationships; metabolites; and duration of action and use. Use of the drug in special populations (e.g., pregnant or breast-feeding women, children) may require additional non-clinical assessments. Guidance for non-clinical safety studies to support human clinical studies in special populations should be reviewed (see, e.g., ICH S5 Reproductive Toxicology, S11 Nonclinical Paediatric Safety, and M3).

Assessment of the preclinical characteristics, including physiological and toxicological effects of the drug, serve to inform clinical study design and planned use in humans. Before proceeding to studies in humans there should be sufficient non-clinical information to support initial human doses and duration of exposure.

4.3 Clinical Studies

Clinical drug development, defined as studying the drug in humans, is conducted in a sequence that builds on knowledge accumulated from non-clinical and previous clinical studies. The structure of the drug development programme will be shaped by many considerations and comprised of studies with different objectives, different designs, and different ependencies. The Annex provides an illustrative list of example studies and their objectives. Although clinical drug development is often described as consisting of four temporal phases (phases 1-4), it is important to appreciate that the phase concept is a description and not a requirement, and that the phases of drug development may overlap or be combined.

To develop new drugs efficiently, it is essential to identify their characteristics in the early stages of development and to plan an appropriate development programme based on this profile. Initial clinical studies may be more limited in size and duration to provide an early evaluation of short-term safety and tolerability as well as proof of concept of efficacy. These studies may provide pharmacodynamic, pharmacokinetic, and other information needed to choose a suitable dosage range and/or administration schedule to inform further clinical studies. As more information is known about the drug, clinical studies may expand in size and duration, may include more diverse study populations, and may include more secondary endpoints in addition to the primary measures of efficacy. Throughout development, new data may suggest the need for additional studies.

The use of biomarkers has the potential to facilitate the availability of safer and more effective drugs, to guide dose selection, and to enhance a drug’s benefit-risk profile (see ICH E16 Qualification of Genomic Biomarkers) and may be considered throughout drug development. Clinical studies may evaluate the use of biomarkers to better target patients more likely to benefit and less likely to experience adverse reactions, or as intermediate endpoints that could predict clinical response.

The following subsections describe the types of studies that typically span clinical development from the first studies in humans through late development and post-approval.

4.3.1 Human Pharmacology

The protection of study participants should always be the first priority when designing early clinical studies, especially for the initial administration of an investigational product to humans (usually referred to as phase 1). These studies may be conducted in healthy volunteer participants or in a selected population of patients who have the condition or the disease, depending on drug properties and the objectives of the development programme.

These studies typically address one or a combination of the following aspects:

4.3.1.1 Estimation of Initial Safety and Tolerability

The initial and subsequent administration of a drug to humans is usually intended to determine the tolerability of the dose range expected to be evaluated in later clinical studies and to determine the nature of adverse reactions that can be expected. These studies typically include both single and multiple dose administration.

4.3.1.2 Pharmacokinetics

Characterisation of a drug's absorption, distribution, metabolism, and excretion continues throughout the development programme, but the preliminary characterisation is an essential early goal. Pharmacokinetic studies are particularly important to assess the clearance of the drug and to anticipate possible accumulation of parent drug or metabolites, interactions with metabolic enzymes and transporters, and potential drug-drug interactions. Some pharmacokinetic studies are commonly conducted in later phases to answer more specialised questions. For orally administered drugs, the study of food effects on bioavailability is important to inform the dosing instructions in relation to food. Obtaining pharmacokinetic information in sub-populations with potentially different metabolism or excretion, such as patients with renal or hepatic impairment, geriatric patients, children, and ethnic subgroups should be considered (ICH E4 Dose-Response Studies, E7 Clinical Trials in Geriatric Population, E11, and E5, respectively).

4.3.1.3 Pharmacodynamics & Early Measurement of Drug Activity

Depending on the drug and the endpoint of interest, pharmacodynamic studies and studies relating drug levels to response (PK/PD studies) may be conducted in healthy volunteer participants or in patients with the condition or disease. If there is an appropriate measure, pharmacodynamic data can provide early estimates of activity and efficacy and may guide the dosage and dose regimen in later studies.

4.3.2 Exploratory and Confirmatory Safety and Efficacy Studies

After initial clinical studies provide sufficient information on safety, clinical pharmacology and dose, exploratory and confirmatory studies (usually referred to as phases 2 and 3, respectively) are conducted to further evaluate both the safety and efficacy of the drug. Depending on the nature of the drug and the patient population, this objective may be combined in a single or small number of studies. Exploratory and confirmatory studies may use a variety of study designs depending on the objective of the study.

Exploratory studies are designed to investigate safety and efficacy in a selected population of patients for whom the drug is intended. Additionally, these studies aim to refine the effective dose(s) and regimen, refine the definition of the targeted population, provide a more robust safety profile for the drug, and include evaluation of potential study endpoints for subsequent studies. Exploratory studies may provide information on the identification and determination of factors that affect the treatment effect and, possibly combined with modelling and simulation, serve to support the design of later confirmatory studies.

Confirmatory studies are designed to confirm the preliminary evidence accumulated in earlier clinical studies that a drug is safe and effective for use for the intended indication and recipient population. These studies are often intended to provide an adequate basis for marketing approval, and to support adequate instructions for use of the drug and official product information. They aim to evaluate the drug in participants with or at risk of the condition or disease who represent those who will receive the drug once approved. This may include investigating subgroups of patients with frequently occurring or potentially relevant comorbidities (e.g., cardiovascular disease, diabetes, hepatic and renal impairment) to characterise the safe and effective use of the drug in patients with these conditions.

Confirmatory studies may evaluate the efficacy and safety of more than one dose or the use of the drug in different stages of disease or in combination with one or more other drugs. If the intent is to administer a drug for a long period of time, then studies involving extended exposure to the drug should be conducted (ICH E1 Clinical Safety for Drugs used in Long-Term Treatment). Irrespective of the intended duration of administration, the duration of effect of the drug will also inform the duration of follow-up.

Study endpoints selected for confirmatory studies should be clinically relevant and reflect disease burden or be of adequate surrogacy for predicting disease burden or sequelae.

4.3.3 Special Populations

Some groups in the general population require additional investigation during drug development because they have unique risk/benefit considerations, or because they can be anticipated to need modification of the dose or schedule of a drug. ICH E5 and E17 provide a framework for evaluating the impact of ethnic factors on a drug’s effect. Particular attention should be paid to the ethical considerations related to informed consent in vulnerable populations (ICH E6 and E11). Studies in special populations may be conducted during any phase of development to understand the drug effects in these populations. Some considerations of special populations are the following:

4.3.3.1 Investigations in pregnant women

Investigation of drugs that may be used in pregnancy is important. Where pregnant women volunteer to be enrolled in a clinical study, or a participant becomes pregnant while participating in a clinical study, follow-up evaluation of the pregnancy and its outcome and the reporting of outcomes are necessary.

4.3.3.2 Investigations in lactating women

Excretion of the drug or its metabolites into human milk should be examined where applicable and feasible. When nursing mothers are enrolled in clinical studies their babies are usually also monitored for the effects of the drug.

4.3.3.3 Investigations in children

ICH E11 provides an outline of critical issues in paediatric drug development and approaches to the safe, efficient, and ethical study of drugs in paediatric populations.

4.3.3.4 Investigations in geriatric populations

ICH E7 provides an outline of critical issues in developing drugs for use in geriatric populations and approaches to their safe, efficient, and ethical study.

4.3.4 Post-Approval Studies

After the approval of a drug, additional studies may be conducted to further understand the safety and efficacy of the drug in its approved indication (usually referred to as phase 4). These are studies that were not considered necessary for approval but are often important for optimising the drug's use. They may be of any type but should have valid scientific objectives. Post-approval studies may be conducted to address a regulatory requirement.

Post-approval studies may be performed to provide additional information on the efficacy, safety, and use of the drug in populations more diverse than included in the studies conducted prior to marketing authorisation. Studies with long-term follow-up or with comparisons to other treatment options or standards of care may provide important information on safety and efficacy. Commonly conducted studies include additional drug-drug interaction, dose-response or safety studies and studies designed to support use under the approved indication (e.g., mortality/morbidity studies, epidemiological studies). These studies may explore use of the drug in the real-world setting of clinical practice and may also inform health economics and health technology assessments.

4.4 Additional Development

After initial approval, drug development may continue with studies of new or modified indications in new patient populations, new dosage regimens, or new routes of administration. If a new dose, formulation, or combination is studied, additional non-clinical and/or human pharmacology studies may be indicated. Data from previous studies or from clinical experience with the approved drug may inform these programmes.

5. DESIGN ELEMENTS AND DATA SOURCES FOR CLINICAL STUDIES

Study objectives impact the choice of study design and data sources, which in turn impact the strength of a study to support regulatory decisions and clinical practice. As discussed in Section 4, there are a wide variety of study objectives in drug development. Similarly, there is a wide range of study designs and data sources to address these objectives. Sections 5.1 through 5.6 discuss key elements that may be used to define the study design, and Section 5.7 discusses the various data sources that may be used for the study.

Clear objectives will help to specify the study design, and conversely, the process of specifying the design may help to further clarify the objectives. At the design stage, the objectives may need to be modified if substantial practical considerations and limitations or other risks to critical to quality factors are identified. The study objectives are further refined through specification of estimands. Estimands, discussed in ICH E9(R1) Addendum: Statistical Principles for Clinical Trials, provide a precise description of the treatment effects reflecting the clinical questions posed by the study objectives. The estimand summarises at a population level what the outcomes would be in the same patients under the different treatment conditions being compared.

An important distinction between studies is whether the allocation of individuals to the study drug(s) is controlled by the study procedures or allocation to the drug is not controlled but exposure to the drug(s) is observed in the study. In this document, the former case is referred to as an interventional study and the latter case is referred to as an observational study.

Interventional studies, and in particular randomised studies, play a central role in drug development, as they can better control biases. The designs of randomised studies range from simple parallel group designs to more complex variants. For example, adaptive design studies allow prospectively planned modifications to the study, such as changes in the population studied or changes in doses of the drug studied over the course of the study, based on accumulating data. Master protocol studies allow for the investigation of multiple drugs or multiple conditions under a shared framework. Platform studies allow for multiple drugs to be investigated in a continuous manner, with different drugs entering the study at different times and leaving the study based on pre-specified decision rules.

Studies without randomisation (whether interventional or observational) can play a role as well in certain settings when randomisation is not feasible. Observational studies are often conducted post-approval but can be of utility as complementary sources of evidence during development and across the life cycle of a drug.

Along with the breadth of study designs, there are multiple sources of data that studies may employ. Traditionally, studies have used study-specific data collection processes. Data such as that obtained from electronic medical records or digital health technologies may be leveraged to increase the efficiency of studies or generalisability of study results.

This section presents important elements that define the design of a clinical study including population, treatment, control group, response variable, methods to reduce bias, statistical analysis, and data sources. It is intended to assist in identifying the critical to quality factors necessary to achieve the study objectives, while also enabling flexibility in study design and promoting efficiency in study conduct. Although the focus is on interventional studies, the discussion is intended to apply to both interventional and observational studies. The elements outlined here are expected to be relevant to study types and data sources that are used in clinical studies now and that may be developed in the future.

5.1 Study Population

The population to be studied should be chosen to support the study objectives and is defined through the inclusion and exclusion criteria for the study. The degree to which a study succeeds in enrolling the desired population will impact the ability of the study to meet its objectives.

The study population may be narrowly defined to reduce the risk to study participants or to maximise the sensitivity of the study for detecting a certain effect. Conversely, it may be broadly defined to more closely represent the diverse populations for which the drug is intended. In general, studies conducted early in a development programme, when little is known about the safety of the drug, are more homogeneous in study population definitions. Studies conducted in the later phases of drug development or post-approval are often more heterogeneous in study population definitions. Such studies should involve participants who are representative of the diverse populations which will receive the intervention in clinical practice. Available knowledge about participant characteristics that may predict disease outcomes or effects of the intervention can be used to further define the study population.

The number of participants (sample size) in a study should be large enough to provide a reliable answer to the questions addressed (see ICH E9). This number is usually determined by the primary objective of the study. If the sample size is determined on some other basis, then this should be made clear and justified. For example, a sample size determined to address safety questions or meet important secondary objectives may need larger numbers of participants than needed for addressing the primary efficacy question (see ICH E1). If study objectives include obtaining information on certain subgroups, then efforts should be made to ensure adequate representation of these subgroups.

5.2 Treatment Description

The treatment(s), including controls, under study should be described explicitly and specifically. These might be individual treatments (including different doses or regimens), combinations of treatments, or no treatments, and can include specification of background treatments. The definition of treatments should align with the objectives of the study (ICH E9(R1)). For example, if the objective of the study is to understand the effect of the treatment in clinical practice, the study may specify that the background treatment, if any, is up to the discretion of the participants and healthcare providers. If the objectives are to understand the effect of the drug when added to a specific background treatment, the background treatment should be defined explicitly and specifically for all groups including controls.

5.3 Choice of Control Group

The major purpose of a control group is to separate the effect of the treatment(s) from the effects of other factors such as natural course of the disease, other medical care received, or observer or patient expectations (E10 Choice of Control Group in Clinical Trials). The treatment effect of interest may be the effect relative to not receiving the drug or the effect relative to receiving other therapies. Comparisons may be made with placebo, no treatment, standard of care, other treatments, or different doses of the drug under investigation.

The source of control group data may be internal or external to the study. The intent of using an internal control group is to help ensure that the only differences between treatment groups are due to the treatment they receive and not due to differences in the selection of participants, the timing and measurement of study outcomes, or other differences. A special case of an internal control group is when each participant serves as their own internal control by receiving the drug and control at different points of time. With use of an external control group, individuals are selected from an external source, and the individuals may have been treated at an earlier time (historical control group) or during the same time but in another setting than participants in the study.

Important limitations of the use of external controls are discussed in ICH E10. Particular care is needed to minimise the likelihood of erroneous inference. The use of an external control requires that the disease course is well known and predictable. External control individuals may differ from study participants with respect to demographic and background characteristics (e.g., medical history, concurrent diseases). In addition, external control individuals may differ from participants in the study with respect to concurrent care and the measurement of study outcomes and other data elements. Because the use of internal controls generally mitigates the potential for bias better than external controls, particularly in conjunction with randomisation, the suitability of the use and choice of external control should be carefully considered and justified. Section 5.5 discusses the sources of bias which can arise in observational studies and is relevant to the use of external controls.

Participant level data may not be available for some choices of external control groups. Summary measures may be available to form the basis of comparisons with treated participants to estimate drug effects and test hypotheses about those effects. There is, however, less ability to control for differences in characteristics between study individuals in the external control group and study participants in the internal treatment groups in making these comparisons or examining the quality and completeness of individual data elements. Additionally, there may not be the ability to examine subgroups or modify the response variable to be consistent with the response variable used in the study.

5.4 Response Variables

A response variable is an attribute of interest that may be affected by the drug. The response variable may relate to pharmacokinetics, pharmacodynamics, efficacy, or safety of the drug, or to the use of the drug including, for example, in adherence to risk minimisation measures postapproval. Study endpoints are the response variables that are chosen to assess drug effects.

The primary endpoint should be capable of providing clinically relevant and convincing evidence related to the primary objective of the study (ICH E9). Secondary endpoints are either supportive measurements related to the primary objective or measurements of effects related to the secondary objectives. Exploratory endpoints are used to further explain or to support study findings or to explore new hypotheses for later research. The choice of endpoints should be meaningful for the intended population and may also take into account the views of patients. The definition of each study endpoint should be specific and include how and at what time points in a participant’s treatment course of the drug and follow-up it is ascertained.

Knowledge of the drug, along with the clinical context and purpose of a given study affect what response variables should be collected. For example, a proof-of-concept study of relatively short duration may employ a pharmacodynamic outcome rather than the outcome of primary interest (ICH E9). A larger study of longer duration could then be used to confirm a clinically meaningful effect on the outcome of primary interest. In other cases, such as a study where the safety profile of the drug is well characterised, the extent of safety data collection may be tailored to the objectives of the study.

5.5 Methods to Reduce Bias

The study design should address potential sources of bias that can undermine the reliability of results. Although different types of studies are subject to different sources of bias, this section addresses some common sources. ICH E9 discusses principles for controlling and reducing bias mainly in the context of interventional studies.

In studies with internal control groups, randomisation is used to ensure comparability of treatment groups, thereby minimising the possibility of bias in treatment assignment.

Randomisation at the start of the study addresses differences between the groups at the time of randomisation but does not prevent bias due to differences arising during the study. Events after randomisation (particularly intercurrent events (ICH E9(R1)) may affect the validity and interpretation of comparisons between treatment groups. Examples include treatment discontinuation or use of rescue medications. There may also be differences in the follow-up patterns between the groups due to participants in one group discontinuing the study at different rates, because of, for example, adverse events or perceived lack of efficacy. Careful consideration of the potential for intercurrent events to occur during the study and their impact will help with the identification of critical to quality factors, such as reducing study discontinuation, continuing data collection following treatment discontinuation, and retrieving data after study discontinuation, if appropriate. It is important when defining the treatment effect (estimand) to account for the occurrence of intercurrent events.

Concealing the treatment assignments (blinding) limits the occurrence of conscious or unconscious bias in the conduct and interpretation of a clinical study that may affect the course of treatment, monitoring, endpoint ascertainment, and participants’ responses. In a single-blind study the investigator is aware of the treatment, but the participant is not. When the investigators who are involved in the treatment or clinical evaluation of the participants are also unaware of the treatment assignments, the study is referred to as double-blind. In an openlabel study, the consequences of the lack of blinding may be reduced through the use of prespecified decision rules for aspects of study conduct, such as recruitment, treatment assignment, participant management, safety reporting, and response variable ascertainment. Blinding for staff at the study sites or sponsor should be implemented where feasible.

Knowledge of interim results (whether individual or treatment group level) has the potential to introduce bias or influence the conduct of the study and interpretation of study results. Specific considerations related to information flow and confidentiality are therefore necessary.

Observational studies introduce unique challenges to the assessment and control of bias. These include ensuring that the individuals have the condition under study and ensuring comparability between treatment groups, in prognostic factors associated with the choice of therapies, in the ascertainment of response variables, and in post-baseline concomitant patient care. These challenges may also exist with the use of external controls in an interventional study. Methods exist that may mitigate some of these challenges and should be considered during the design phase.

5.6 Statistical Analysis

The statistical analysis of a study encompasses important elements necessary to achieving the study objectives. The specification and documentation of the statistical analysis are important for ensuring the integrity of the study findings. The principal features of the statistical analysis should be planned during the design of the study and should be clearly specified in a protocol written before the study begins (ICH E9). Full details of the planned statistical analysis should be specified and documented before knowledge of the study results that may reveal the drug effects, which may be accomplished using a separate statistical analysis plan. The protocol should define the estimand(s) following the framework established in ICH E9(R1).

Statistical analyses of primary and secondary endpoints that address key study objectives with respect to both efficacy and safety should be described in the protocol, including any interim analyses and/or planned design adaptations. Other statistical aspects of the study that should be described in the protocol include the analytical methods for any planned estimation and tests of hypotheses about the drug effect and a justification of the sample size.

The statistical analysis should include pre-specified sensitivity analyses for assessing the impact of the assumptions made for the primary and important secondary analyses on the results of the study (E9(R1)). For example, if the analysis relies on a particular assumption about the reasons for missing data, sensitivity analyses should be planned to assess the impact of that assumption on the study results. In the case of observational studies, sensitivity analyses might, for example, consider additional potential confounders.

Pre-specification of the analysis approach is particularly important for studies that make use of existing data sources rather than primary data collection (Section 5.7), not only for the statistical analysis planned for the study but also for any feasibility analysis to assess the applicability of the existing data. For example, for a single-arm interventional study with an external control, the specifics of the external control should be defined prior to the conduct of the interventional aspect of the study. Pre-specification of the analysis should be in place so that any review of the existing data sources prior to the design of the study does not threaten the study integrity.

The statistical analysis should be carried out in accordance with the prospectively defined analysis plan, and all deviations from the plan should be indicated in the study report (E3 Clinical Study Reports).

5.7 Study Data

Study data comprise all information generated, collected, or used in the context of the study ranging from existing source data to study-specific assessments. The study data should contain the necessary information to conduct the statistical analysis specified in the protocol and statistical analysis plan, as well as to monitor for participant safety, protocol adherence, and data integrity.

Study data can be broadly classified into two types: (1) data generated specifically for the present study (primary data collection) and (2) data obtained from sources external to the present study (secondary data use). Data generated for the study may be collected via case report forms, laboratory measurements, electronic patient reported outcomes, or mobile health tools. Examples of external sources of data include historical clinical studies, national death databases, disease and drug registries, claims data, and medical and administrative records from routine medical practice. A study may make use of both types of data.

For all data sources, procedures to ensure the protection of personal data of the individuals being studied should be implemented. The study protocol, and if applicable the informed consent, should explicitly address the protection of personal data. Regulations related to protection of individuals’ data need to be followed. When considering data from external sources, it is important to ascertain whether the regulatory authorities accept the use of such data for purposes other than the original intent.

Study data should be of sufficient quality to address the objectives of the study and, in interventional studies, to monitor participant safety. Data quality attributes include consistency (uniformity of ascertainment over time), accuracy (correctness of collection, transmission, and processing), and completeness (lack of missing information). These aspects should be proactively considered during study planning by identifying the factors, critical to the quality of the study, associated with data sourcing, collection, and processing.

The use of standards for data recording and coding (or recoding) is important to support data reliability, facilitate correct analysis and interpretation of results, and promote data sharing. Internationally accepted data standards exist for many sources of study data and should be used where applicable.

With primary data collection, the methods and standards established for use at the point of capture and the subsequent processing provide an opportunity to prospectively ensure the quality of the data.

With secondary data use, the relevance of the available data should be considered and clearly described in the study protocol. For example, when using existing electronic health record data to ascertain the study endpoint rather than through primary data collection, information in the health record about outcomes may need to be converted to the study endpoint.

In some cases, secondary data use may not be sufficient for all aspects of the study and may need to be supplemented by primary data collection. The quality of data collected for a different purpose should be evaluated when re-used in the context of the present study. Careful quality control processes may have been applied during their acquisition; where used, those processes were not necessarily designed with the objectives of the present study in mind.

There are several additional considerations with secondary data use. For example, methods to conceal the treatment should be considered when selecting and prior to analysing data from external sources. As another example, absence of affirmative information on a condition or event does not necessarily mean the condition or event is not present. There may also be a delay between the occurrence of events and their appearance in existing data sources. To the extent possible, uncertainties and potential sources of bias should be addressed at the study design stage, during data analysis, and in the interpretation of the study results.

6. CONDUCT, SAFETY MONITORING, AND REPORTING

6.1 Study Conduct

The principles and approaches set out in this guideline, including those of quality by design, should inform the approach taken to the conduct and reporting of clinical studies. Risk proportionate mitigation measures should be employed to ensure the integrity of the critical to quality factors.

6.1.1 Protocol Adherence

Adherence to the study protocol and other relevant documents is essential, and many aspects of adherence should be considered among the study’s critical to quality factors. Successful application of the quality by design principles may minimise the need for modifications to the protocol and make adherence throughout the study more likely. If modification of the protocol becomes necessary, a clear description of the rationale for the modification should be provided in a protocol amendment, and the impact of the modification on study conduct should be carefully considered.

6.1.2 Training

Individuals involved in study conduct should receive training commensurate with their role in the study and this training should occur prior to their becoming involved in the study. Updated training or retraining may be needed to address issues related to critical to quality factors observed during the course of the study, and/or implement protocol modifications.

6.1.3 Data Management

The manner and timelines in which study data are collected and managed are critical contributors to overall study data quality. Operational checks, centralised data monitoring, and statistical surveillance can identify important data quality issues for corrective action. Data management procedures should account for the diversity of data sources in use for clinical studies (Section 5.7). For interventional clinical studies, further guidance on data management is available in ICH E6.

6.1.4 Access to Interim Data

Inappropriate access to data during the conduct of the study may compromise study integrity (Sections 5.5 and 5.6 and ICH E9). In studies with planned interim analyses, special attention should be given to which individuals have access to the data and results. Even in studies without planned interim analyses, special attention should be paid to any ongoing monitoring of unblinded data to avoid inappropriate access.

6.2 Participant Safety during Study Conduct

Important standards of ethical conduct and the protection of participants in clinical studies are described in Section 2.1. This section describes safety related considerations during the conduct of the study.

6.2.1 Safety Monitoring

The goals of safety monitoring are to protect study participants and to characterise the safety profile of the drug. Procedures and systems for the identification, monitoring, and reporting of safety concerns during the study should be clearly specified. The approach should reflect the type and objectives of the study, the risks to the study participants and what is known about the drug and the study population. Guidance is available on reporting of safety data to appropriate authorities and on the content and timing of safety reports (ICH E2-E2F Pharmacovigilance, and, for interventional clinical trials in particular, ICH E6).

6.2.2 Withdrawal Criteria

Clear criteria for stopping treatment or study procedures for a study participant while remaining in the study are necessary to ensure the protection of the participants but should also minimise loss of critical data.

6.2.3 Data Monitoring Committee

An important component of safety monitoring in many clinical studies is the use of an independent data monitoring committee. This group monitors accumulating data while the study is being conducted to make recommendations on whether to continue, modify, or terminate a study.

During programme planning, the need for an independent data monitoring committee to monitor safety data across studies in a development programme should also be assessed. If a data monitoring committee is needed for either an individual study or across the development programme, procedures governing its operation and, in particular the review of unblinded data in an interventional trial, while preserving study integrity (ICH E9) should be established prior to study start.

6.3 Study Reporting

Clinical studies and their results should be adequately reported using formats appropriate for the type of study (interventional or observational studies) and information being reported. ICH E3 focuses particularly on the report format for interventional clinical trials, but the basic principles may be applied to other types of clinical studies (ICH E3 Q&A). The design of the study report should be part of the quality by design process. The report should describe the critical to quality factors in the study. The reporting of study results should be comprehensive, accurate, and timely.

Consideration should be given to providing a factual summary of the overall study results to study participants in an objective, balanced and nonpromotional manner, including relevant safety information and any limitations of the study. In addition, consideration could be given to providing individual participants with information about their study specific results (e.g., their treatment arm, test results). The information should be conveyed by someone involved in the health management of the participant (e.g., the clinical investigator). Participants should be informed about the information they will receive and when they will receive it at the time of providing informed consent.

The transparency of clinical research in drug development includes the registration of clinical studies, before they start, on publicly accessible and recognised databases, and the public posting of clinical study results. Adopting such practices for observational studies also promotes transparency. Making objective and unbiased information publicly available can benefit public health in general, as well as the indicated patient populations, through enhancing clinical research, reducing unnecessary clinical studies, and informing decisions in clinical practice.

7. CONSIDERATIONS IN IDENTIFYING CRITICAL TO QUALITY FACTORS

The identification of critical to quality factors should be supported by proactive, crossfunctional discussions and decision making at the time of study planning, as described in Section 3. Different factors will stand out as critical for different types of studies, following the concepts introduced in Sections 4 through 6.

In designing a study, the following aspects should be considered, where applicable, to support the identification of critical to quality factors:

Engagement of all relevant stakeholders, including patients, is considered during study planning and design.
The prerequisite non-clinical studies, and where applicable, clinical studies, are complete and adequate to support the study being designed.
The study objectives address relevant scientific questions appropriate for a given study’s role in the development programme, taking into account the accumulated knowledge about the product.
The clinical study design supports a meaningful comparison of the effects of the drug when compared to the chosen control group.
Adequate measures are used to protect participants’ rights, safety, and welfare (informed consent process, Institutional Review Board/Ethics Committee review, investigator and clinical study site training, pseudonymisation).
Information provided to the study participants should be clear and understandable.
Competencies and training required for the study by sponsor and investigator staff, relevant to their role, should be identified.
The feasibility of the study should be assessed to ensure the study is operationally viable.
The number of participants included, the duration of the study, and the frequency of study visits are sufficient to support the study objective.
The eligibility criteria should be reflective of the study objectives and be well documented in the clinical study protocol.
The protocol specifies the collection of data needed to meet the study objectives, understand the benefit/risk of the drug, and monitor participant safety.
The choice of response variables and the methods to assess them are well-defined and support evaluation of the effects of the drug.
Clinical study procedures include adequate measures to minimise bias (e.g., randomisation, blinding).
The statistical analysis plan is pre-specified and defines the analysis methods appropriate for the endpoints and the populations of interest.
Systems and processes are in place that support the study conduct to ensure the integrity of critical study data.
The extent and nature of study monitoring are tailored to the specific study design and objectives and the need to ensure participants’ safety.
The need for and appropriate role of a data monitoring committee is assessed.
The reporting of the study results is planned, comprehensive, accurate, timely, and publicly accessible.

These considerations are not exhaustive and may not apply to all studies. Other aspects may need to be considered to identify the critical to quality factors for each individual study.

ANNEX: TYPES OF CLINICAL STUDIES

Drug development is ideally a logical, stepwise process in which information from early studies is used to support and plan later studies. The actual sequence of studies conducted in a particular drug development programme, however, may reflect different dependencies and overlapping study types. Studies may also involve adaptive designs (which may bridge or combine different study types as listed below) or designs that are intended to investigate multiple drugs or multiple indications or both (e.g., studies conducted under a master protocol). In the table below, types of clinical studies are categorised by objectives. Illustrative examples, not intended to be exhaustive or exclusive, are provided. Study objectives appearing under one type may also occur under another.

Type of Study	Objective(s) of Study	Study Examples
Human Pharmacology	Assess tolerance and safety Define/describe clinical PK¹ and PD² Explore drug metabolism and drug interactions Evaluate activity, assess immunogenicity Assess renal/hepatic tolerance Assess cardia toxicity	BA³/BE⁴ studies under fasted/fed conditions Dose-tolerance studies Single and multiple-rising dose PK and/or PD studies Drug-drug interaction studies QTc prolongation study Human factor studies for drug delivery devices
Exploratory	Explore use for the intended indication Estimate dose/dosing regimen for subsequent studies Explore doseresponse/exposure-response relationship Provide basis for confirmatory study design (e.g., targeted population, clinical endpoints, patient reported outcome measures, factors affecting treatment effects)	Randomised controlled clinical trials of relatively short duration in elldefined narrow patient populations, using surrogate or pharmacological endpoints or clinical measures Dose finding studies Biomarker exploration studies Studies to validate patient reported outcomes Adaptive designs that may combine exploratory and confirmatory objectives
Confirmatory	Demonstrate/confirm efficacy Establish safety profile in larger, more representative patient populations Provide an adequate basis for assessing the benefit/risk relationship to support licensing Establish doseresponse/exposure-response relationship Establish safety profile and confirm efficacy in specific populations (e.g., paediatrics, elderly)	Randomised controlled clinical trials to establish efficacy in larger, more representative patient populations Dose-response studies Clinical safety studies Studies of mortality/morbidity outcomes Studies in special populations Studies that seek to demonstrate efficacy for multiple drugs in a single protocol
Post-Approval	Extend understanding of benefit/risk relationship in general or special populations and/or environments Identify less common adverse reactions Refine dosing recommendations	Comparative effectiveness studies Long-term follow-up studies Studies of mortality/morbidity or other additional endpoints Large, simple randomised trials Pharmacoeconomic studies Pharmacoepidemiology studies Observational studies of the use of the drug in clinical practice Disease or drug registries
1 PK: Pharmacokinetic 2 PD: Pharmacodynamic 3 BA studies: Bioavailability 4 BE studies: Bioequivalence

ICH E8

ICH E8(R1) 临床试验的一般考虑

English

ICH E8(R1) General Considerations for Clinical Studies

1. 本文件的目的

实施药物临床研究是为了获得信息，这些信息最终可使患者获得安全有效的药品，对患者产生有意义的影响，同时保护研究受试者。本文件为临床研发生命周期提供指南，包括临床研究相关质量设计，同时考虑到临床研究设计和使用的数据来源的广泛性。

ICH 文件“临床研究的一般考虑”旨在：

1. 阐述在临床研究设计和实施中被国际公认的原则和惯例，以确保保护受试者和促进监管机构接受数据和结果。

2. 提供在产品生命周期中的临床研究设计和实施方面质量考虑相关的指南，包括在研究计划期间确定研究关键质量因素，以及在研究实施过程中对这些因素的风险管理。

3. 提供在产品生命周期中开展临床研究类型的概述，并阐述用于支持关键质量因素识别的研究设计要素，以确保对研究受试者的保护、数据的完整性、结果的可靠性及研究能够实现其目的的能力。

4. 提供ICH 有效性文件的指南，以便于使用者查阅。

本文件的第2 节阐述了临床研究设计的一般原则，第3节中讨论了临床研究相关质量设计。第4 节提供了药物研发计划的简要概述，以及在整个产品生命周期中通过不同类型的研究获得信息以推进药物研发。第5 节描述了临床研究设计的要素，这些要素反映了药物研发中使用的各种设计以及可用的数据来源范围。第6 节讨论研究实施、确保研究受试者安全和研究报告。第7 节提供了对确定研究中关键质量因素的一些考虑事项。

ICH 有效性指导原则是一套涵盖临床研究的计划、设计、实施、安全、分析和报告的完整指南。ICH E8 全面介绍了临床研发，临床研究中的质量设计，并重点关注研究关键质量因素。应以综合、整体的方式考虑和使用指南，而不是仅关注一个指南或子章节。

就本文件而言，临床研究是指在产品生命周期的任何阶段，包括上市许可前和上市许可后，在人体中开展的一项或多项药物研究。关注点主要在支持监管决策的临床研究，认识到这些研究也可能为卫生政策决策、临床实践指南或其他行动提供信息。术语“药物”应被视为与治疗用、预防用或诊断用药物同义。术语“药物批准”指获得该药物的上市许可。

2. 一般原则

2.1 临床研究受试者的保护

临床研究伦理准则和保护受试者（包括特殊人群）的重要原则起源于赫尔辛基宣言，与人体相关临床研究必须遵循这些原则。这些原则在其他ICH 指南中有所阐述，特别是ICH E6 药物临床试验质量管理规范。

正如E6 指南所述，机构审查委员会/独立伦理委员会、研究者及申办者共同承担保护研究受试者的责任。

应根据适用的法规和法律要求保护可识别受试者身份的机密信息。

在开始临床研究之前，应获得足够的信息，以确保该药物在计划的人体研究中具有可接受的安全性。当出现新的非临床、临床、药物质量数据时，应由有资历的专家对这些数据进行审查和评价，以评估对研究受试者安全的潜在影响。应考虑新获得的信息，根据需要适当调整正在进行和将要开展的研究，以保护研究受试者。在整个药物研发过程中，应注意从科学角度出发，以保证所有研究程序及评估的必要性，不会给研究受试者带来过度的负担。

2.2 临床研究设计、计划、实施、分析及报告的科学方法

临床研究的实质是提出重要问题并通过适当的研究回答问题。任何研究的主要目的都应该反映研究问题，并被清晰明确表述。临床研究应遵循合理的科学原则进行设计、计划、实施、分析和报告，以达到其目的。

在本文件中，临床研究的质量应与临床研究目的相符。临床研究目的是生成可靠信息，以回答研究问题并支持决策，同时保护研究受试者。因此，所生成的信息质量应足以支持良好的决策。

临床研究中的质量源于设计,旨在确保通过将质量设计到研究方案和过程中来主动推动研究质量。这涉及使用前瞻性、多学科方法，以与涉及的风险相称的方式提高方案和过程设计的质量，并就如何实现这一目标进行明确沟通。

在产品生命周期中，不同类型的研究具有不同的目的和设计，并可能会涉及不同数据来源。就本指导原则而言，研发计划被视为涵盖产品整个生命周期（第4 节）。附录提供了药物研发不同阶段根据目的进行的广泛分类。应严格设计研究，以解决研究目的，并注重设计要素，例如：研究人群和反应变量的选择以及使结果偏倚最小化的方法（第5 节）。

进行系列研究背后的主要逻辑是，既往研究的结果应该为之后的研究计划提供依据。新获得的数据往往会促使研发策略的调整。例如：确证性研究的结果可能提示需要进行额外的人体药理学研究。

由于药物研发计划日益全球化，在ICH 指导原则的协调下，多区域数据的可用性最大限度地减少了在不同区域进行个别研究的需要。一项研究的结果通常用于多个地区的监管申请，设计还应考虑研究结果与进行研究地区以外地区的相关性。ICH E5 种族因素、ICH E6、ICH E17 多区域临床试验提供了进一步的指导。

鼓励监管机构早期介入，以了解当地/地区的要求和期望，并有助于促进将质量设计融入研究中的能力。

2.3 患者参与药物研发

在药物研发过程中，倾听患者和/或患者组织的意见，有助于确保获取来自患者的观点。患者（或其看护人员/父母）的意见在药物研发的所有阶段均有价值。在设计的早期阶段让患者参与研究可能会增加其对研究的信任，促进招募并提高依从性。患者还可提供其对所患疾病的看法，这可能有助于决策，例如：对患者有意义的终点、选择合适的研究人群和研究持续时间，以及使用可接受的对照。从而最终支持研发更适合患者需求的药物。

3. 临床研究相关质量设计

质量源于设计的临床研究方法（第3.1 节）涉及关注关键质量因素，以确保保护研究受试者的权利、安全以及福祉，生成可靠、有意义的研究结果，以及使用基于风险的方法对这些因素进行风险管理（第3.2 节）。质量源于设计主要依靠在研究设计、计划以及整个研究的实施、分析和报告过程中，通过建立适当的框架来确定和审查关键质量因素（第3.3 节）。

3.1 临床研究的质量源于设计

质量是临床研究设计、计划、实施、分析和报告的主要考虑因素，也是临床研发计划的必要组成部分。通过对于研究方案、程序、相关实施计划和培训等所有组成部分设计的前瞻性关注，可以显著提高临床研究回答研究问题的可能性，同时防止重要错误的发生。回顾性活动如文件、数据审查和监查是质量保证流程的重要组成部分；但是，即使与稽查相结合，也不足以确保临床研究质量。

良好的临床研究规划和实施源于对第5 节中所述的临床研究设计要素的关注，例如：

需要明确预先确定的以解决主要科学问题的研究目的；
选择具有正在研究疾病、状况或分子/基因谱的合适受试者；
使用偏倚最小化的方法，例如：随机化、盲法或遮蔽、和/或控制混杂因素；
终点明确、可测量、具有临床意义且与患者相关的终点。

实施标准也很重要，例如：确保清楚地了解研究的可行性、选择适宜的研究中心、确保专业分析和检测设施及程序的质量以及保证数据完整性的流程。

3.2 关键质量因素

应为每项研究确定与保证研究质量相关的一系列基本因素。应对那些影响研究质量的重要因素予以重视。这些关键质量因素是一项研究的本质，其完整性是研究受试者保护、研究结果可靠性和可解释性，以及基于研究结果作出决策的基础。这些质量因素至关重要，因为如果它们的完整性因设计或实施差错而破坏，基于研究结果决策的可靠性或伦理原则也会被破坏。还应全面考虑关键质量因素，以便确定它们之间的依赖关系。本文件第7 节提供了有助于确定研究关键质量因素的考虑事项。

临床研究设计应反映针对药物的认知和经验状态；治疗、诊断或预防的疾病；潜在的生物学机制（包括疾病和治疗）；以及药物所针对的目标人群。随着研究的进展，对药物的认知不断增加，药理学、安全性和有效性的不确定性随之降低。研发过程中任何阶段，对药物的了解都将持续为关键质量因素的识别和用于管理这些因素的控制过程提供指导信息。

申办者和其他各方基于质量源于设计理念设计研究时应确定关键质量因素。在确定了这些因素之后，重要的是确定威胁其完整性的风险，并根据其概率、可检测性和影响来决定是否可以接受或应该降低这些风险。在确定应降低风险的情况下，应制定必要的控制流程并进行沟通，并采取必要措施降低风险。此处使用的术语“风险”在一般风险管理方法背景下，适用于研究的所有因素。

积极主动地沟通关键质量因素及采取的风险缓解措施将有助于申办者和研究中心了解优先事项和资源分配情况。积极的支持（例如：对研究中心工作人员进行与其工作职责相关的培训、在研究方案中阐述关键质量因素以及潜在降低风险措施）将促进研究方案、程序以及相关的操作计划和流程设计的正确实施。

难以保证一项措施的各个方面都达到完美，或者只能通过使用与获得的利益不成比例的资源来实现。在设计研究时，应优先确定对研究至关重要的质量因素，研究程序应与研究中固有的风险及所收集信息的重要性相匹配。关键质量因素应该明确，且不应被次要问题所混淆（例如：与适当的研究受试者保护和/或主要研究目的无关的大量的次要目的或流程/数据收集）。

3.3 确定关键质量因素的方法

以质量方法来设计研究的关键方面是询问研究所要解决的目的是否清晰明确；研究设计可否解决其提出的研究问题；这些问题是否对患者有意义；以及研究假设是否具体且科学有效。确定关键质量因素的方法应考虑所选设计和数据来源是否能够很好且最有效地实现这些目的。在研究设计过程中尽早与患者沟通有助于这种方法，并最终有助于确定关键质量因素。研究设计应具有实施可行性，避免不必要的繁复。方案和病例报告表/数据的收集方法应保证研究能够按照设计实施，并避免不必要的数据收集。

包含下列因素的方法将有利于确定关键质量因素：

3.3.1 建立支持开放式对话的文化

鼓励创造一种文化，重视和奖励集思广益，并就某一研究或研发项目的关键质量因素进行开放、积极的对话，而不仅仅依赖工具和清单。开放式对话有利于建立可确保质量的创新方法。

不鼓励不灵活的“一刀切”方法。标准化的操作规程对于开展高质量的临床研究是必要和有益的，但也需要研究特定策略和行动来有效和高效地支持研究质量。

在研究开始前及研究期间，应以透明的方式收集和审查用于研究设计的证据，同时承认存在和已知的数据中的差距，并预测可能出现的此类差距或冲突。

3.3.2 关注对研究至关重要的活动

应重点关注对患者和公众健康研究结果的可靠性和意义、受试者安全、受试者研究伦理准则等至关重要的活动。应考虑从研究中删除不必要的活动和数据收集，通过简化流程、提高研究效率和将资源用于关键领域等方式来提高质量。对资源进行部署，以识别和防止或控制重要错误。

3.3.3 利益相关者参与研究设计

临床研究设计最好参考广泛的利益相关者的意见，包括患者和医疗保健服务提供者。临床研究设计应公开接受来自外部以及申办者组织内部的专家和利益相关者的质疑。

可通过直接参与并成功完成研究的人员（例如：临床研究者、研究协调员和其他中心工作人员以及患者/患者组织）为研究质量创建过程提供信息。临床研究者和潜在研究受试者可对招募符合拟定入选标准受试者的可行性、安排的研究访视和程序是否可能过于繁重而导致提前脱落，以及研究终点和研究设置与目标患者人群的一般相关性等提供有价值的见解。他们还可基于目标患者人群中的伦理问题、文化、地区、人口统计学和其他亚组特征的背景提供治疗价值方面的见解。

鼓励尽早与监管机构沟通，尤其是当一项研究出现可能影响质量的新关键要素（例如：确定患者人群、程序或终点）时。

3.3.4 审查关键质量因素

应运用积累的经验和知识，结合关键质量因素的定期审查结果，确定是否需要调整风险控制机制，因为一旦研究开始，可能会出现新发或意想不到的问题。

在对关键质量因素和风险管理进行前瞻性规划和持续审查期间，需要特别关注具有适应性特征和/或中期决策点的研究（ICH E9 临床试验的统计学原则）。

3.3.5 操作实践中的关键质量因素

一项成功研究的基础是具有一个既科学合理又操作可行的方案。从操作角度而言，可行性评估涉及考虑可能影响临床研发成功完成的研究设计和实施要素。

研究可行性的考虑事项还包括但不限于：医疗实践和患者人群的地区差异、具有开展临床研究（ICH E6）经验的合格研究者/中心人员的可用性、成功实施研究所需设备和设施的可用性、目标患者人群的可用性，以及招募足够数量的受试者以达到研究目的的能力。研究受试者的保留和随访也是关键质量因素。考虑这些以及其他与研究可行性相关的关键质量因素可为研究设计提供信息，并提高实施质量。

4. 药物研发计划

本节概述在制定药物研发计划时考虑的一般原则。药物研发计划应遵循科学研究和良好研究设计的原则，从而确保结果的可靠性和可解释性。高效的药物研发包括在整个研发过程中与监管机构进行有适当计划的沟通，以确保符合产品质量要求以及支持药物用于特定状况或疾病的批准，包括为解决遗留问题而开展批准后研究。在整个过程中，应重点关注对研究受试者的权利、安全和福祉的保护。

药物研发计划建立在整个研究过程中获取的知识的基础上，以降低从靶标识别到非临床和临床评价过程中的不确定性。该计划包括含化学、生产和控制的药品质量（CMC）、非临床和临床研究（批准前和批准后）。建模与模拟可指导药物研发的全过程。研发计划还可能包括产品进入市场的区域性考虑因素，如卫生技术评估。

重要的是，确保贯穿药物整个生命周期来收集与药物研发和评价相关的利益相关者的经验、观点、需求和重要事项，并有的放矢地整合至药物研发计划中。

临床研发还可能要求共同研发有利于药物安全有效使用的经过验证的生物标志物、诊断检测或器械。

可能有助于药物研发的研究类型在第4.2 节和第4.3 节阐述，并在附录中予以总结。

4.1 试验用药的质量

在制定药物研发计划时，确保试验用药品的质量和理化性质的表征符合要求是重要因素，并在ICH 和区域性质量指导原则中予以阐明。复合物或生物产品可能需要进行更为广泛的表征。应在药物研发计划中充分表征药物处方，包括生物利用度信息（酌情），药物处方还应与药物研发阶段和目标患者人群相匹配。当计划在儿童人群中开展临床研究时，可考虑研发适合相应年龄段的处方（ICH E11-E11A 在儿科人群中开展的临床试验）。

药物质量评价可能会延伸至给药所需的器械或用于识别目标人群的伴随诊断。

研发过程中产品如发生变更，应提供可比性数据予以支持，以确保研究结果在整个研发计划中自始至终能够解释。其中包括通过生物等效性研究或其他方式确立不同处方之间的关系。

4.2 非临床研究

非临床安全性研究指导原则参见ICH M3 非临床安全性研究、ICH 安全性（S）指南和相关问答文件以及区域性指导原则。非临床评估通常包括毒理学、致癌性、免疫原性、药理学、药代动力学和其他用于支持临床研究的评估（并且可能包括利用体内和体外模型以及通过建模与模拟）。非临床研究的范围及相对于临床研究的实施时间取决于影响药物进一步研发的各种因素，例如：药物的化学或分子特性；主要作用的药理学基础（作用机制）；给药途径；吸收、分布、代谢和排泄（ADME）；对器官系统产生的生理学效应；剂量/浓度-效应关系；代谢物；以及作用持续时间和使用期限。如药物用于特殊人群（例如：妊娠期或哺乳期女性、儿童），则可能需要额外开展非临床评估。应参阅为支持在特殊人群中开展人体临床研究而实施非临床安全性研究时所遵循的指导原则（例如：ICH S5 生殖毒理学、S11 非临床儿童安全性和M3）。

非临床研究的评估，包括药物的药理学和毒理学研究，应服务于临床研究设计和拟定用途。在开展人体研究之前，应获取足够的非临床信息支持初始人体剂量和暴露持续时间的选择。

4.3 临床研究

临床药物研发是指在人体中研究药物，基于从非临床和既往开展的临床研究积累的知识按一定的顺序实施。药物研发计划的结构受诸多因素的影响，该计划由目的、设计方式和依赖关系各不相同的若干研究组成。关于示例研究及其目的的说明性清单见附录。虽然通常临床药物研发分为四个阶段（1 期-4 期），但需要明确的是，阶段的概念只是便于描述，并非规定，并且药物研发的阶段可能重叠或合并。

为了高效地研发新药，必须在研发早期阶段识别药物的特征，并根据其特征制定合理的研发计划。最初开展的临床研究可能在规模和持续时间上受到更多限制，以便就短期用药的安全性和耐受性开展早期评估，并对有效性进行概念验证。早期研究收集药效学、药代动力学以及支持剂量范围和/或给药方案选择所需的其他信息，从而为后续开展的临床研究提供有用的信息。随着获取的药物信息的增加，临床研究的规模得以扩大，持续时间延长，研究人群更加多样化，除主要有效性指标外，还可能纳入更多次要终点。在整个研发过程中，新数据可能提示需要开展额外研究。

生物标志物的使用有助于研发出更加安全有效的药物、指导剂量选择，并改善药物的获益-风险特征（参见ICH E16基因组生物标志物的鉴定），可在药物研发的整个过程中予以考虑。在临床研究中可以评价生物标志物的使用，从而更好地遴选出获益概率高且不良反应发生概率低的患者，或作为可预测临床反应的中间终点。

下文阐述了从首次人体研究至后期研发和批准后研究的整个临床研发阶段开展的各种类型的研究。

4.3.1 人体药理学

在设计早期临床研究时，务必将研究受试者的保护视为第一要务，特别是试验药物首次用于人体给药（通常称为1期研究）时。这些研究可在健康志愿者中进行，也可在受累于某种状况或疾病的患者群体中进行，取决于药物特性和研发计划的目的。

此类研究通常拟用于解决以下一个或多个方面的问题：

4.3.1.1 初步安全性和耐受性评估

人体首次和后续给药通常旨在确定药物在后续临床研究中需要进一步评价的剂量范围内的耐受性，亦可确定可能出现的不良反应的性质。通常采用单次给药和多次给药的方式。

4.3.1.2 药代动力学

虽然药物吸收、分布、代谢和排泄特征研究贯穿于整个研发计划，但初步的特征描述是一个重要的早期目标。药代动力学研究对评估药物的清除、预测原型药物或其代谢物是否蓄积、与代谢酶和转运体的相互作用以及潜在的药物间相互作用等方面尤为重要。部分药代动力学研究通常在后续的研发阶段进行，以回答更加特殊的问题。对于口服药物，评价食物对生物利用度的影响对获取与食物有关的给药说明是重要的。应考虑代谢或排泄能力存在潜在差异的亚组人群的药代动力学信息，例如：肾或肝功能损害的患者、老年患者、儿童和各种种族亚组（分别参见ICH E4 剂量-效应研究、E7 在老年人群中开展的临床试验、E11 以及E5）。

4.3.1.3 药效学与药物活性的早期测定

基于药物的性质和关注的终点，药效学研究以及与血药浓度-效应相关的研究（PK/PD 研究）可以在健康志愿者或受累于特定状况或疾病的患者中实施。如果有适当的衡量指标，药效学数据可以提供药物活性与有效性的早期评估数据，并为有待在后续研究中评估的给药剂量和给药方案提供参考。

4.3.2 探索性与确证性的安全性与有效性研究

从初步临床研究充分获取有关安全性、临床药理学和剂量方面的信息后，继续开展探索性研究和确证性研究（通常分别称为2 期和3 期），以进一步评价药物安全性和有效性。该研究目的可在单个或少量研究中同时实现，这取决于药物性质和患者人群。探索性与确证性研究可根据研究目的的不同而采用多种研究设计。

探索性研究旨在考察药物在特定患者群体中的安全性和有效性。此外，探索性研究的目的在于提炼有效剂量和治疗方案，细化目标人群的定义，确保药物安全性特征的稳健性，并包括对后续研究中采纳的潜在研究终点的评价。探索性研究可提供有关识别和确定影响治疗效果因素的信息，并结合建模与模拟，有助于支持随后的确证性研究设计。

确证性研究旨在确证早期临床研究中积累的关于药物在预期用途和用药人群中的安全性和有效性的初步证据。确证性研究通常旨在为药物上市批准提供充分的依据，并为药物的使用和官方公布的制剂信息提供充分的说明。此外，确证性研究在受累于特定状况或疾病或面临此类风险的受试者（即一旦获得批准后将使用该药物的人群）中评价药物，可能包括在受累于频发或潜在相关合并症（例如：心血管疾病、糖尿病、肝和肾功能损害）的患者亚群中开展研究，以确定药物在这类患者中的安全性和有效性。

确证性研究可评价多于一种剂量、或在不同疾病阶段用药、或联合一种或多种其他药物使用时的有效性和安全性。如为拟长期使用的药物，应开展涉及长期暴露的研究（ICH E1长期用药的临床安全性）。随访持续时间取决于药物效应的持续时间，与预期用药持续时间无关。

为确证性研究选择的研究终点应具有临床相关性，并反映疾病负担，或在预测疾病负担或后遗症方面具有充分的替代性。

4.3.3 特殊人群

在药物研发中，针对一般人群中的部分群体需要开展额外研究，因为这部分群体独特的风险/获益考虑，或预期需要调整药物剂量或给药方案。ICH E5 和E17 为评估种族因素对药物效应产生的影响提供了一个框架。在弱势人群中开展研究时，应特别重视与知情同意相关的伦理学考虑（ICH E6 和E11）。可于药物研发的任何阶段在特殊人群中开展研究，以了解药物在该类人群中的作用。以下是针对特殊人群的一些考虑：

4.3.3.1 在妊娠女性中开展的研究

研究妊娠期女性可能使用的药物具有重要意义。如妊娠期女性志愿者入组临床研究，或受试者在参加临床研究期间妊娠，必须随访妊娠及结局并报告结局。

4.3.3.2 在哺乳期女性中开展的研究

应酌情检测药物或其代谢物向人乳中的分泌情况。如果哺乳期女性入组临床研究，应同时监测药物对乳儿产生的影响。

4.3.3.3 在儿童中开展的研究

ICH E11 提供了在儿科人群中研发药物的关键问题纲要，以及在儿科人群中安全有效和符合伦理的研究方法。

4.3.3.4 在老年人群中开展的研究

ICH E7 提供了在老年人群中研发药物的关键问题纲要，以及在老年人群中安全有效和符合伦理的研究方法。

4.3.4 批准后研究

药物获批后，可能需要开展额外的研究（通常称为4 期研究），以进一步了解药物在批准适应症中的安全性和有效性。获批后研究并非药物获批所必需，但对优化药物的使用通常具有重要意义。虽然研究的类型不受限制，但应具有合理的科学目的。开展批准后研究可能是解决监管要求。

批准后研究可额外收集有效性、安全性和用药数据，而且批准后研究中所纳入人群的多样性高于药物获得上市许可前实施的研究所入组的人群。研究中有长期随访，或有与其他治疗方案或标准治疗的比较，则可能提供有关安全性和有效性的重要信息。常见的研究包括额外的药物相互作用、剂量-效应或安全性研究，以及支持在获批准适应症下使用的研究（例如：死亡率/发病率研究、流行病学研究）。这些研究可用于探索该药物在临床实践真实环境中的使用，也可为卫生经济学和卫生技术评估提供信息。

4.4 追加研发

在首次获批后，可继续开展在新患者人群中新适应症或修改后的适应症、新剂量方案或新给药途径的研究。如评价新剂量、新处方或联合用药，则可能需要额外开展非临床和/或人体药理学研究。使用来源于获批药物既往研究或临床经验的数据可为这些计划提供参考。

5. 临床研究的设计要素和数据来源

研究目的影响研究设计和数据来源的选择，进而影响研究支持监管决策和临床实践的力度。正如第4 节所讨论，在药物研发期间会存在多个研究目的。同样，可通过广泛的研究设计和数据来源达到这些目的。第5.1 节至第5.6 节讨论了可用于确定研究设计的关键要素，第5.7 节讨论了可用于研究的各种数据来源。

明确的目的将有助于确定研究设计，反之，对设计具体化的过程可能有助于进一步明确目的。在设计阶段，如果确认存在重要实践考虑，以及关键质量因素的局限性或其他风险，则可能需要修订目的。通过指定估计目标，进一步细化研究目的。估计目标（详见ICH E9（R1）增补：临床试验的统计学原则）是对治疗效应的精确描述，反映了针对临床试验目的提出的临床问题。估计目标在群体水平上汇总比较相同患者在不同治疗条件下的结局。

各种研究之间的一个重要区别是，为受试者分配研究药物是否受研究程序的控制，或者药物分配不受控制，但在研究中观察药物暴露。在本文件中，前一种情况称为干预性研究，后一种情况称为观察性研究。

干预性研究，尤其是随机研究，在药物研发中发挥核心作用，因为此类研究可以更好地控制偏倚。随机研究的设计范围从简单的平行组设计到更复杂的设计。例如：采用适应性设计研究可以根据累积的数据对研究进行前瞻性计划修改，如研究人群的改变或研究过程中研究药物剂量的调整。主方案研究允许在共享框架下研究多种药物或多种病症。平台研究允许以连续的方式研究多种药物，不同药物在不同时间进入研究并根据预先指定的决策规则退出研究。

随机化不可行时，无随机化的研究（无论是干预性研究还是观察性研究）也可在某些情况下发挥作用。观察性研究通常在批准后进行，但可作为药物研发过程中和整个生命周期中的补充证据来源。

随着研究设计的广度，研究还可使用多种数据来源。传统上，研究使用研究特定的数据收集过程。可利用从电子医疗记录或数字健康技术中获得的数据提高研究效率或研究结果的普遍性。

本节介绍确定临床研究设计的重要要素，包括人群、治疗、对照组、反应变量、减少偏倚的方法、统计分析和数据来源。旨在帮助确定达到研究目的所必需的关键质量因素，同时提高研究设计的灵活性及实施研究的效率。尽管重点为干预性研究，但讨论旨在同时适用于干预性研究和观察性研究。本文概述的要素将与目前临床研究中使用并可能会在将来研发的研究类型和数据来源相关。

5.1 研究人群

应选择支持研究目的的研究人群，并通过研究的入组和排除标准进行定义。研究中成功入组期望人群的程度将影响研究达到研究目标的能力。

研究人群可能被狭义地定义，以降低研究受试者的风险，或最大限度地提高研究对检测某种效应的敏感性。相反，它可以被广义地定义为更接近代表拟用药物的不同人群。一般而言，在研发项目早期进行的研究，当对药物的安全性知之甚少时，研究人群的定义往往更为单一化，而在药物研发后期或批准后进行的研究中，研究人群的定义往往更为多样化。此类研究应涉及代表将在临床实践中接受干预的不同人群的受试者。可使用关于可能预测疾病结果或干预效果的受试者特征的现有知识进一步定义研究人群。

研究中的受试者数量（样本量）应足够大，以便为所解决的问题提供可靠答案（参见ICH E9）。受试者数量通常取决于研究的主要目的。如果样本量在其他基础上确定，则其依据应明确且合理。例如：与解决主要有效性问题所需的受试者数量相比，为解决安全性问题或满足重要次要目的而确定的样本量可能更大（参见ICH E1）。如果研究目的包括获得某些亚组的信息，则应努力确保这些亚组具有充分的代表性。

5.2 治疗描述

应对研究的治疗，包括对照，进行明确且具体的阐述。治疗可能为单独治疗（包括不同剂量或方案）、联合治疗或不治疗，并且可能包括指定的背景治疗。治疗定义应与研究目的保持一致（ICH E9（R1））。例如：如果研究目的是了解治疗在临床实践中的效果，则研究可能会指定背景治疗（如果有），这取决于受试者和医疗保健提供者的判断。如果研究目的是了解将药物添加到特定背景治疗时的效果，则应明确且具体地指定所有研究组（包括对照组）的背景治疗。

5.3 对照组的选择

设置对照组的主要目的是区分治疗效应与其他因素的效应，如疾病的自然病程、接受的其他医疗护理或观察者或患者的期望（ICH E10 临床试验中对照组的选择）。研究的治疗效应可能是与不接受药物治疗或与接受其他疗法治疗相比较的效应。可以与安慰剂、不治疗、标准治疗、其他治疗或不同剂量的研究药物进行比较。

对照组的数据可来自研究的内部或外部。使用内部对照组的目的是帮助确保治疗组之间唯一的差异是受试者接受的治疗，而非受试者选择差异、研究结果的时间和测量差异或其他差异。内部对照组的特殊情况为，受试者可以作为他们自己的内部对照，在不同的时间点接受研究药物和对照药物。使用外部对照，受试者从外部来源中选择，对照组受试者可以在更早的时间（历史对照组）或在同一时间但在本研究中的受试者以外的其他研究中接受治疗。

ICH E10 讨论了使用外部对照的重要局限性。需特别注意尽量降低错误推断的可能性。使用外部对照要求对病程清晰且可预测。外部对照受试者在人口统计学和背景特征（例如：病史、伴发疾病）方面可能与研究受试者不同。此外，外部对照受试者在同步治疗和研究结果的测量及其他数据因素方面可能与参与研究的受试者有所不同。由于与外部对照相比，使用内部对照通常更能减少偏倚，特别是在与随机化结合的情况下，因此，应仔细考虑使用和选择外部对照的适用性，并证明其合理。第5.5 节讨论了观察性研究中可能出现的偏倚来源，与外部对照的使用有关。

对于选择的某些外部对照组，可能无法获得受试者水平的数据。但如果可提供综合测量，则可以使用这些数据来形成与接受治疗的受试者进行比较的基础，以估计药物疗效并检验关于此疗效的假设。然而，在进行这些比较或检查单个数据元素质量和完整性时，难以控制外部对照组中的研究受试者与内部治疗组中的研究受试者之间的特征差异。此外，可能无法检查亚组，或无法修改反应变量以与研究中使用的反应变量保持一致。

5.4 反应变量

反应变量是一种被关注的可能受药物影响的指标。反应变量可能与药物的药代动力学、药效学、有效性或安全性有关，或与药物的使用有关，例如：依从批准后的风险最小化措施。研究终点是选择用于评估药物效应的反应变量。

主要终点应能够提供与研究主要目的相关的具有临床相关性和有说服力的证据（ICH E9）。次要终点是与主要目的相关的支持性测量，或与次要目的相关的效应的测量。探索性终点用于进一步解释或支持研究结果或为以后的研究探索新的假设。终点的选择对于目标人群应该是有意义的，并且应考虑到患者的观点。每个研究终点的定义应该是具体的，包括在受试者药物治疗和随访过程中，研究终点的定义是如何确定的，以及在什么时间点确定的。

对药物的了解和特定研究的临床背景和目的都会影响应被收集的反应变量。例如：持续时间相对较短的概念验证研究可能采用药效学结果，而不是主要关注的结果（ICH E9）。然后，可以通过持续时间更长的大规模研究确认是否对主要关注的结果产生具有临床意义的影响。在其他情况下，例如：在药物的安全性特征被明确确认的研究中，安全性数据收集的范围可以根据研究目的进行调整。

5.5 减少偏倚的方法

研究设计应提出可能破坏结果可靠性的潜在偏倚来源。尽管不同类型的研究受到不同偏倚来源的影响，但本节说明的是一些常见来源。ICH E9 主要在干预性研究的背景下讨论控制和减少偏倚的原则。

在有内部对照组的研究中，可采用随机化确保治疗组的可比性，从而将治疗分配中出现偏倚的可能性降至最低。

研究开始时的随机化可以解决随机化时各组别之间的差异，但并不能防止研究期间出现差异而导致的偏倚。随机化后的事件（特别是伴发事件（ICH E9（R1））可能会影响治疗组之间进行比较的效力和解释，如治疗终止或使用补救用药。由于一组受试者以不同比率终止研究，如由于不良事件或缺乏疗效，各组之间的随访模式也可能存在差异。仔细考虑研究期间发生伴发事件的可能性及其影响将有助于确定关键质量因素，如降低研究终止率、治疗终止后继续收集数据，以及在研究终止后回收数据（如适当）。在确定治疗疗效（估计目标）时，重要的是要考虑伴发事件的发生。

隐藏治疗分配（盲法）减少临床研究的实施和解释过程中有意识或无意识的偏倚的发生，这些偏倚可能影响治疗过程、监测、终点确定和受试者反应。单盲研究中，研究者知道治疗分配情况，但受试者不知。如果参与受试者治疗或临床评价的研究者也不知道治疗分配情况，则该研究称为双盲研究。在开放研究中，可通过对研究实施方面（如招募、治疗分配、受试者管理、安全报告和反应变量确定）使用预先制定的决策规则来减少盲法缺失的后果。在可行的情况下，应对研究中心工作人员或申办者进行设盲。

获知中期结果（无论受试者水平还是治疗组水平）有可能引入偏倚，或影响研究的实施和研究结果的解释。因此，需特别考虑与信息流和机密性有关的事项。

观察性研究在偏倚评估和控制方面存在独特的挑战，包括确保受试者患有所研究的病症并确保治疗组之间的可比性、与治疗选择相关的预后因素、反应变量的确定以及基线后伴随的患者护理。在干预性研究中使用外部对照也可能存在上述挑战。有些方法可能减轻上述某些挑战，应在设计阶段予以考虑。

5.6 统计分析

研究的统计分析是试验研究目标所必需的重要因素。统计分析规范和文档对于确保研究结果的完整性十分重要。统计分析的主要特征应在研究设计期间进行规划，并应在研究开始前编写的方案中明确规定（ICH E9）。在了解可能揭示药物疗效的研究结果之前，应详细说明和记录计划的统计分析的全部细节，可使用单独的统计分析计划完成。方案应根据ICH E9（R1）中确立的框架定义估计目标。

应在方案中描述以实现有效性和安全性为关键研究目的的主要和次要终点的统计分析，包括任何期中分析和/或计划的设计调整。应在方案中描述研究中的统计方面其他事项，包括对药物疗效假设的评估和检验的分析方法，以及确定样本量的依据。

统计分析应包括预先指定的敏感性分析，以评估对主要和重要次要分析做出的假设对研究结果的影响（E9（R1））。例如：如果主要分析依赖于对数据缺失原因的特定假设，则应计划敏感性分析，以评估这些假设对研究结果的影响。在观察性研究中，敏感性分析可能会考虑其他潜在的混杂因素。

对于双盲研究，应在披露治疗分配之前确定完成统计分析。因此，如果一项研究包括一项或多项期中分析，则不应在涉及揭盲的期中分析完成后更改所计划的统计分析计划。对于开放研究和单盲研究，理想情况下，将在第一例受试者被随机分配或分配到研究干预组之前，确定有关主要和重要次要分析的详细信息。

对于利用现有数据来源而非主要数据收集的研究（第5.7 节），预先规范分析方法尤其重要，不仅应适用于研究计划的统计分析，还应适用于评估现有数据适用性的任何可行性分析。例如：对于具有外部对照的单臂干预研究，在进行研究干预之前，应指定具体的外部对照。应预先规范分析方法，以便即使在设计研究之前对现有数据来源进行任何审查也不会威胁研究的完整性。

统计分析应按照前瞻性确定的统计分析计划进行，并且应在临床研究报告（ICH E3 临床研究报告）中注明偏离统计分析计划的所有情况。

5.7 研究数据

研究数据包括在研究背景下生成、收集或使用的所有信息，范围为从现有源数据到研究特定的评估。研究数据应包含执行方案和统计分析计划中规定的统计分析所需的必要信息，以及监测受试者的安全性、方案依从性和数据完整性所需的必要信息。

研究数据可大致分为两类：（1）专门为本研究生成的数据（主要数据收集）和（2）从本研究外部获得的数据（次要数据使用）。为研究生成的数据可通过病例报告表、实验室检测、电子版患者报告结局或移动健康工具收集。以外部数据来源为例，包括历史临床研究、国家死亡数据库、疾病和药物登记处、索赔数据以及常规医疗实践的医疗和行政记录。一项研究可以同时使用以上两种类型的数据。

对于所有数据来源，应执行可以确保保护受试者个人数据的程序。研究方案和知情同意书（如适用）应明确解决个人数据保护问题。应遵循与受试者数据保护相关的法规。考虑外部来源的数据时，重要的是确定监管机构是否接受将此类数据用于原始意图以外的目的。

研究数据的质量应足以实现研究目的，并在干预性研究中监测受试者安全。数据质量属性包括一致性（随时间变化确认的一致性）、准确性（收集、传输和处理的正确性）和完整性（无缺失信息）。在研究计划期间，应通过识别对研究质量至关重要的因素，以及与数据来源、收集和处理相关的关键质量因素主动考虑这些方面。在研究计划期间应主动考虑这些方面，通过识别与数据来源、收集和处理相关的关键质量因素。

使用数据记录和编码（或重新编码）标准对支持数据可靠性、方便结果的正确分析和解释以及促进数据共享十分重要。国际上普遍接受的数据标准适用于许多研究数据来源，并应在适用的情况下使用。

对于主要数据收集，在捕获点和后续处理中使用既定方法和标准为前瞻性地确保数据质量提供了机会。

对于次要数据使用，应考虑现有数据的相关性，并在研究方案中明确阐述。例如：使用现有电子健康记录数据而非通过主要数据收集来确定研究终点时，健康记录中有关结局的信息可能需要转换为研究终点。

在某些情况下，次要数据的使用可能不足以涵盖研究的所有方面，可能需要通过收集主要数据进行补充。在本研究中再次使用时，应评价出于不同目的收集的数据的质量。在其获取过程中可能已经应用了严密的质量控制程序；在使用时，这些程序的设计不一定考虑到当前研究的目的。

在使用次要数据时，还有一些额外的注意事项。例如：在选择外部来源数据时和分析外部来源数据之前，应考虑隐藏治疗的方法。另一个示例为，缺乏关于病症或事件的确证信息并不一定意味着该症状或事件不存在。事件的发生与其存在的现有数据来源之间也可能存在延迟。在研究设计阶段、数据分析期间和研究结果的解释中，应尽可能解决不确定性和潜在偏倚来源。

6. 实施、安全性监测以及报告

6.1 研究实施

本指导原则中规定的原则和方法，包括质量源于设计的原则和方法，应为临床研究的实施和报告所采取的方法提供信息。应采用适当的风险缓解措施，以确保关键质量因素的完整性。

6.1.1 方案遵守

遵守研究方案和其他相关文件是必要的，在研究的关键质量因素中，应考虑遵守多个方面。质量源于设计原则的成功应用可最大限度地减少对方案进行修改，并更有可能在整个研究期间遵守方案。如果必须对方案进行修改，应在方案修订案中明确说明修改的依据，并仔细考虑修改对研究实施的影响。

6.1.2 培训

参与研究实施的个人应在参与研究之前接受符合其在研究中职责的培训。为解决在研究过程中观察到的与关键质量因素相关的问题，和/或实施方案修改，可能需要进行更新培训或再培训。

6.1.3 数据管理

收集和管理研究数据的方式和时间线对总体研究数据质量至关重要。操作检查、集中数据监测和统计监督可识别需要采取纠正措施的重要数据质量问题。数据管理程序应考虑到临床研究中使用的数据源的多样性（第5.7 节）。对于干预性临床研究，关于数据管理的进一步指导原则参见ICH E6。

6.1.4 访问期中数据

在研究实施期间不适当地访问数据可能会损害研究的完整性（第5.5 节、第5.6 节以及ICH E9）。在有计划期中分析的研究中，应特别注意哪些人可以访问数据和结果。即使在没有计划期中分析的研究中，也应特别注意对非盲态数据的任何持续监测，以避免不适当的访问。

6.2 研究实施期间的受试者安全

第2.1 节描述了临床研究中伦理行为的重要标准和对受试者的保护。本节描述了在研究实施期间的安全性相关考虑因素。

6.2.1 安全性监测

安全性监测的目的是保护研究受试者和描述药物的安全性特征。研究期间，应明确规定识别、监测和报告安全性问题的程序和系统。该方法应反映研究类型和目的、研究受试者的风险以及对药物和研究人群的了解情况。已有向有关主管部门报告安全性数据以及安全性报告内容和时间的指导原则（ICH E2-E2F 药物警戒，特别是对于干预性临床试验，ICH E6）。

6.2.2 退出标准

对于保留在研究中的受试者，有必要制定明确的停止治疗或研究程序的标准，以确保受试者受到保护，但也应尽量减少关键数据的丢失。

6.2.3 数据监查委员会

在许多临床研究中，安全性监测的一个重要组成部分是使用独立的数据监查委员会。数据监查委员会在实施研究时监测累积的数据，以建议是否继续、修改或终止研究。

在项目计划期间，还应评估是否需要一个独立的数据监查委员会来监查研发项目各项研究中的安全性数据。如果个别研究或整个研发项目需要一个数据监查委员会，则应在研究开始前建立管理其实施的程序，尤其是在保持研究完整性的同时审查干预性试验中的非盲态数据（ICH E9）。

6.3 研究报告

应使用适合研究类型（干预性或观察性研究）和报告信息的格式充分报告临床研究及其结果。ICH E3 特别侧重于干预性临床试验的报告格式，但其基本原则可能适用于其他类型的临床研究（ICH E3 问答）。研究报告的设计应为质量源于设计过程的一部分。报告应描述研究中的关键质量因素。报告研究结果应全面、准确和及时。

应考虑以客观、平衡和非宣传的方式向研究受试者提供总体研究结果的事实总结，包括相关的安全性信息和研究的任何局限性。此外，可以考虑向个体受试者提供有关其研究特定结果的信息（例如：其治疗组别、检查结果）。该信息应由参与受试者健康管理的人员（例如：临床研究者）传达。在提供知情同意时，应告知受试者其将收到的信息以及何时将收到信息。

药物研发临床研究的透明度包括在临床研究开始前，可在公开访问和识别的数据库中注册临床研究，以及公开公布临床研究结果。在观察性研究中采用这种做法也有助于提高透明度。公开客观和无偏倚的信息加强了临床研究、减少了不必要的临床研究，同时为临床实践决策提供了信息，有益于公共卫生以及适用的患者人群。

7. 识别关键质量因素的考虑

如第3 节所述，在研究规划时，应通过前瞻性、跨职能部门的讨论和决策来支持识别关键质量因素。根据第4 节至第6 节中介绍的概念，不同的因素对不同类型的研究至关重要。

在设计研究时，应考虑以下适用的方面，以支持识别关键质量因素：

在研究计划和设计过程中考虑所有利益相关者（包括患者）的参与。
作为先决条件的非临床研究以及适用的临床研究完整，并足以支持所设计的研究。
研究目的阐述了适用于在研发计划中某个研究要解决的相关科学问题，同时考虑到有关产品的已累积知识。
当与选定对照组比较时，临床研究设计支持进行有意义的药物疗效比较。
采取适当措施保护受试者的权利、安全和福祉（知情同意程序、机构审查委员会/伦理委员会审查、研究者和临床研究中心培训、匿名）。
提供给研究受试者的信息应清晰易懂。
应确定申办者和研究者进行研究所需的与其职责相关的能力和培训。
应评估研究可行性，以确保研究在实施上可行。
入选的受试者数量、研究持续时间和研究访视频率足以支持研究目的。
入组标准应反映研究目的，并在临床研究方案中有详细记录。
方案规定达到研究目的、了解药物的获益/风险和监测受试者安全所需的数据收集。
反应变量的选择和评估方法定义明确，并支持对药物疗效的评估。
临床研究程序包括尽量减少偏倚的适当措施（例如：随机分组、盲法）。
预先制定统计分析计划，并定义适用于研究终点和获益研究人群的分析方法。
已建立支持研究实施的系统和流程，以确保关键研究数据的完整性。
研究监查的范围和性质根据具体的研究设计和目的以及确保受试者安全的需要而进行调整。
评估数据监查委员会的必要性和适当作用。
报告研究结果应有计划、全面、准确、及时和公开。

这些考虑并不详尽，可能不适用于所有研究。在识别每项单独研究的关键质量因素时，可能需要考虑其他方面。

附录：临床研究类型

药物研发理论上是一个合乎逻辑、循序渐进的过程，在此过程中，来自早期研究的信息可用来支持和计划随后的研究。然而，在特定药物研发项目中进行的实际研究顺序可能反映了不同的依赖性和重叠的研究类型。研究还可能涉及适应性设计（可以桥接或结合下文列出的不同研究类型）或旨在研究多种药物或多个适应症或两者兼具的设计（例如：根据主方案进行的多个研究）。下表对临床研究的类型按研究目的进行分类。列举的研究示例，并非详尽或唯一。出现在某一类型下的研究目的也可能出现在另一类型下。

研究类型	研究目的	研究示例
人体药理学	评估耐受性和安全性阐明/ 描述临床PK¹和PD² 探索药物代谢和药物相互作用评价活性，评估免疫原性评估肾/ 肝耐受性评估心脏毒性	空腹/餐后条件下的BA³/BE⁴研究剂量-耐受性研究单次和多次递增剂量的PK 和/或PD 研究药物-药物相互作用研究 QTc 延长研究给药装置的人为因素研究
探索性	探索用于目标适应症评估后续研究的剂量/给药方案探索剂量-效应/暴露量- 效应关提供确证性研究设计的依据（例如：目标人群、临床终点、患者报告结局指标、影响治疗效果的因素）	采用替代终点或药理学终点或临床指标，在明确的狭义患者人群中进行的持续时间相对较短的随机对照临床试验剂量范围探索研究生物标志物探索研究验证患者报告结局的研究可结合探索性和确证性目的的适应性设计
确证性	证明/ 确证有效性在更大、更具代表性的患者人群中确定安全性特征为评估获益/ 风险关系提供足够依据以支持上市许可确定剂量-效应/暴露量- 效应关系确定安全性特征并确认在特殊人群（如儿童，老年人）中的有效性	在更大、更具代表性的患者人群中确定有效性的随机对照临床试验剂量-效应研究临床安全性研究死亡率/发病率结局研究特殊人群中的研究在单一方案中证明多种药物有效性的研究
批准后	扩展对药物在普通人群、特殊人群和/ 或环境中的获益/ 风险关系的认识识别较少见的不良反应优化给药建议	有效性对照研究长期随访研究死亡率/发病率或其他额外终点研究大规模、简单随机试验药物经济学研究药物流行病学研究临床实践中药物使用的观察性研究疾病或药物登记研究
1 PK: 药代动力学 2 PD: 药效学 3 BA研究：生物利用度 4 BE研究：生物等效性

ICH E9

ICH E9 Statistical Principles for Clinical Trials

中文版

ICH E9 临床试验的统计学原则

I. INTRODUCTION

1.1 Background and Purpose

The efficacy and safety of medicinal products should be demonstrated by clinical trials which follow the guidance in 'Good Clinical Practice: Consolidated Guideline' (ICH E6) adopted by the ICH, 1 May 1996. The role of statistics in clinical trial design and analysis is acknowledged as essential in that ICH guideline. The proliferation of statistical research in the area of clinical trials coupled with the critical role of clinical research in the drug approval process and health care in general necessitate a succinct document on statistical issues related to clinical trials. This guidance is written primarily to attempt to harmonise the principles of statistical methodology applied to clinical trials for marketing applications submitted in Europe, Japan and the United States.

As a starting point, this guideline utilised the CPMP (Committee for Proprietary Medicinal Products) Note for Guidance entitled 'Biostatistical Methodology in Clinical Trials in Applications for Marketing Authorisations for Medicinal Products' (December, 1994). It was also influenced by 'Guidelines on the Statistical Analysis of Clinical Studies' (March, 1992) from the Japanese Ministry of Health and Welfare and the U.S. Food and Drug Administration document entitled 'Guideline for the Format and Content of the Clinical and Statistical Sections of a New Drug Application' (July, 1988). Some topics related to statistical principles and methodology are also embedded within other ICH guidelines, particularly those listed below. The specific guidance that contains related text will be identified in various sections of this document.

E1A:	The Extent of Population Exposure to Assess Clinical Safety
E2A:	Clinical Safety Data Management: Definitions and Standards for Expedited Reporting
E2B:	Clinical Safety Data Management: Data Elements for Transmission of Individual Case Safety Reports
E2C:	Clinical Safety Data Management: Periodic Safety Update Reports for Marketed Drugs
E3:	Structure and Content of Clinical Study Reports
E4:	Dose-Response Information to Support Drug Registration
E5:	Ethnic Factors in the Acceptability of Foreign Clinical Data
E6:	Good Clinical Practice: Consolidated Guideline
E7:	Studies in Support of Special Populations: Geriatrics
E8:	General Considerations for Clinical Trials
E10:	Choice of Control Group in Clinical Trials
M1:	Standardisation of Medical Terminology for Regulatory Purposes
M3:	Non-Clinical Safety Studies for the Conduct of Human Clinical Trials for Pharmaceuticals.

This guidance is intended to give direction to sponsors in the design, conduct, analysis, and evaluation of clinical trials of an investigational product in the context of its overall clinical development. The document will also assist scientific experts charged with preparing application summaries or assessing evidence of efficacy and safety, principally from clinical trials in later phases of development.

1.2 Scope and Direction

The focus of this guidance is on statistical principles. It does not address the use of specific statistical procedures or methods. Specific procedural steps to ensure that principles are implemented properly are the responsibility of the sponsor. Integration of data across clinical trials is discussed, but is not a primary focus of this guidance. Selected principles and procedures related to data management or clinical trial monitoring activities are covered in other ICH guidelines and are not addressed here.

This guidance should be of interest to individuals from a broad range of scientific disciplines. However, it is assumed that the actual responsibility for all statistical work associated with clinical trials will lie with an appropriately qualified and experienced statistician, as indicated in ICH E6. The role and responsibility of the trial statistician (see Glossary), in collaboration with other clinical trial professionals, is to ensure that statistical principles are applied appropriately in clinical trials supporting drug development. Thus, the trial statistician should have a combination of education/training and experience sufficient to implement the principles articulated in this guidance.

For each clinical trial contributing to a marketing application, all important details of its design and conduct and the principal features of its proposed statistical analysis should be clearly specified in a protocol written before the trial begins. The extent to which the procedures in the protocol are followed and the primary analysis is planned a priori will contribute to the degree of confidence in the final results and conclusions of the trial. The protocol and subsequent amendments should be approved by the responsible personnel, including the trial statistician. The trial statistician should ensure that the protocol and any amendments cover all relevant statistical issues clearly and accurately, using technical terminology as appropriate.

The principles outlined in this guidance are primarily relevant to clinical trials conducted in the later phases of development, many of which are confirmatory trials of efficacy. In addition to efficacy, confirmatory trials may have as their primary variable a safety variable (e.g. an adverse event, a clinical laboratory variable or an electrocardiographic measure), a pharmacodynamic or a pharmacokinetic variable (as in a confirmatory bioequivalence trial). Furthermore, some confirmatory findings may be derived from data integrated across trials, and selected principles in this guidance are applicable in this situation. Finally, although the early phases of drug development consist mainly of clinical trials that are exploratory in nature, statistical principles are also relevant to these clinical trials. Hence, the substance of this document should be applied as far as possible to all phases of clinical development.

Many of the principles delineated in this guidance deal with minimising bias (see Glossary) and maximising precision. As used in this guidance, the term 'bias' describes the systematic tendency of any factors associated with the design, conduct, analysis and interpretation of the results of clinical trials to make the estimate of a treatment effect (see Glossary) deviate from its true value. It is important to identify potential sources of bias as completely as possible so that attempts to limit such bias may be made. The presence of bias may seriously compromise the ability to draw valid conclusions from clinical trials.

Some sources of bias arise from the design of the trial, for example an assignment of treatments such that subjects at lower risk are systematically assigned to one treatment. Other sources of bias arise during the conduct and analysis of a clinical trial. For example, protocol violations and exclusion of subjects from analysis based upon knowledge of subject outcomes are possible sources of bias that may affect the accurate assessment of the treatment effect. Because bias can occur in subtle or unknown ways and its effect is not measurable directly, it is important to evaluate the robustness of the results and primary conclusions of the trial. Robustness is a concept that refers to the sensitivity of the overall conclusions to various limitations of the data, assumptions, and analytic approaches to data analysis. Robustness implies that the treatment effect and primary conclusions of the trial are not substantially affected when analyses are carried out based on alternative assumptions or analytic approaches. The interpretation of statistical measures of uncertainty of the treatment effect and treatment comparisons should involve consideration of the potential contribution of bias to the p-value, confidence interval, or inference.

Because the predominant approaches to the design and analysis of clinical trials have been based on frequentist statistical methods, the guidance largely refers to the use of frequentist methods (see Glossary) when discussing hypothesis testing and/or confidence intervals. This should not be taken to imply that other approaches are not appropriate: the use of Bayesian (see Glossary) and other approaches may be considered when the reasons for their use are clear and when the resulting conclusions are sufficiently robust.

II. CONSIDERATIONS FOR OVERALL CLINICAL DEVELOPMENT

2.1 Trial Context

2.1.1 Development Plan

The broad aim of the process of clinical development of a new drug is to find out whether there is a dose range and schedule at which the drug can be shown to be simultaneously safe and effective, to the extent that the risk-benefit relationship is acceptable. The particular subjects who may benefit from the drug, and the specific indications for its use, also need to be defined.

Satisfying these broad aims usually requires an ordered programme of clinical trials, each with its own specific objectives (see ICH E8). This should be specified in a clinical plan, or a series of plans, with appropriate decision points and flexibility to allow modification as knowledge accumulates. A marketing application should clearly describe the main content of such plans, and the contribution made by each trial. Interpretation and assessment of the evidence from the total programme of trials involves synthesis of the evidence from the individual trials (see Section 7.2). This is facilitated by ensuring that common standards are adopted for a number of features of the trials such as dictionaries of medical terms, definition and timing of the main measurements, handling of protocol deviations and so on. A statistical summary, overview or meta-analysis (see Glossary) may be informative when medical questions are addressed in more than one trial. Where possible this should be envisaged in the plan so that the relevant trials are clearly identified and any necessary common features of their designs are specified in advance. Other major statistical issues (if any) that are expected to affect a number of trials in a common plan should be addressed in that plan.

2.1.2 Confirmatory Trial

A confirmatory trial is an adequately controlled trial in which the hypotheses are stated in advance and evaluated. As a rule, confirmatory trials are necessary to provide firm evidence of efficacy or safety. In such trials the key hypothesis of interest follows directly from the trial’s primary objective, is always pre-defined, and is the hypothesis that is subsequently tested when the trial is complete. In a confirmatory trial it is equally important to estimate with due precision the size of the effects attributable to the treatment of interest and to relate these effects to their clinical significance.

Confirmatory trials are intended to provide firm evidence in support of claims and hence adherence to protocols and standard operating procedures is particularly important; unavoidable changes should be explained and documented, and their effect examined. A justification of the design of each such trial, and of other important statistical aspects such as the principal features of the planned analysis, should be set out in the protocol. Each trial should address only a limited number of questions.

Firm evidence in support of claims requires that the results of the confirmatory trials demonstrate that the investigational product under test has clinical benefits. The confirmatory trials should therefore be sufficient to answer each key clinical question relevant to the efficacy or safety claim clearly and definitively. In addition, it is important that the basis for generalisation (see Glossary) to the intended patient population is understood and explained; this may also influence the number and type (e.g. specialist or general practitioner) of centres and/or trials needed. The results of the confirmatory trial(s) should be robust. In some circumstances the weight of evidence from a single confirmatory trial may be sufficient.

2.1.3 Exploratory Trial

The rationale and design of confirmatory trials nearly always rests on earlier clinical work carried out in a series of exploratory studies. Like all clinical trials, these exploratory studies should have clear and precise objectives. However, in contrast to confirmatory trials, their objectives may not always lead to simple tests of pre-defined hypotheses. In addition, exploratory trials may sometimes require a more flexible approach to design so that changes can be made in response to accumulating results. Their analysis may entail data exploration; tests of hypothesis may be carried out, but the choice of hypothesis may be data dependent. Such trials cannot be the basis of the formal proof of efficacy, although they may contribute to the total body of relevant evidence.

Any individual trial may have both confirmatory and exploratory aspects. For example, in most confirmatory trials the data are also subjected to exploratory analyses which serve as a basis for explaining or supporting their findings and for suggesting further hypotheses for later research. The protocol should make a clear distinction between the aspects of a trial which will be used for confirmatory proof and the aspects which will provide data for exploratory analysis.

2.2 Scope of Trials

2.2.1 Population

In the earlier phases of drug development the choice of subjects for a clinical trial may be heavily influenced by the wish to maximise the chance of observing specific clinical effects of interest, and hence they may come from a very narrow subgroup of the total patient population for which the drug may eventually be indicated. However by the time the confirmatory trials are undertaken, the subjects in the trials should more closely mirror the target population. Hence, in these trials it is generally helpful to relax the inclusion and exclusion criteria as much as possible within the target population, while maintaining sufficient homogeneity to permit precise estimation of treatment effects. No individual clinical trial can be expected to be totally representative of future users, because of the possible influences of geographical location, the time when it is conducted, the medical practices of the particular investigator(s) and clinics, and so on. However the influence of such factors should be reduced wherever possible, and subsequently discussed during the interpretation of the trial results.

2.2.2 Primary and Secondary Variables

The primary variable (‘target’ variable, primary endpoint) should be the variable capable of providing the most clinically relevant and convincing evidence directly related to the primary objective of the trial. There should generally be only one primary variable. This will usually be an efficacy variable, because the primary objective of most confirmatory trials is to provide strong scientific evidence regarding efficacy. Safety/tolerability may sometimes be the primary variable, and will always be an important consideration. Measurements relating to quality of life and health economics are further potential primary variables. The selection of the primary variable should reflect the accepted norms and standards in the relevant field of research. The use of a reliable and validated variable with which experience has been gained either in earlier studies or in published literature is recommended. There should be sufficient evidence that the primary variable can provide a valid and reliable measure of some clinically relevant and important treatment benefit in the patient population described by the inclusion and exclusion criteria. The primary variable should generally be the one used when estimating the sample size (see section 3.5).

In many cases, the approach to assessing subject outcome may not be straightforward and should be carefully defined. For example, it is inadequate to specify mortality as a primary variable without further clarification; mortality may be assessed by comparing proportions alive at fixed points in time, or by comparing overall distributions of survival times over a specified interval. Another common example is a recurring event; the measure of treatment effect may again be a simple dichotomous variable (any occurrence during a specified interval), time to first occurrence, rate of occurrence (events per time units of observation), etc. The assessment of functional status over time in studying treatment for chronic disease presents other challenges in selection of the primary variable. There are many possible approaches, such as comparisons of the assessments done at the beginning and end of the interval of observation, comparisons of slopes calculated from all assessments throughout the interval, comparisons of the proportions of subjects exceeding or declining beyond a specified threshold, or comparisons based on methods for repeated measures data. To avoid multiplicity concerns arising from post hoc definitions, it is critical to specify in the protocol the precise definition of the primary variable as it will be used in the statistical analysis. In addition, the clinical relevance of the specific primary variable selected and the validity of the associated measurement procedures will generally need to be addressed and justified in the protocol.

The primary variable should be specified in the protocol, along with the rationale for its selection. Redefinition of the primary variable after unblinding will almost always be unacceptable, since the biases this introduces are difficult to assess. When the clinical effect defined by the primary objective is to be measured in more than one way, the protocol should identify one of the measurements as the primary variable on the basis of clinical relevance, importance, objectivity, and/or other relevant characteristics, whenever such selection is feasible.

Secondary variables are either supportive measurements related to the primary objective or measurements of effects related to the secondary objectives. Their pre-definition in the protocol is also important, as well as an explanation of their relative importance and roles in interpretation of trial results. The number of secondary variables should be limited and should be related to the limited number of questions to be answered in the trial.

2.2.3 Composite Variables

If a single primary variable cannot be selected from multiple measurements associated with the primary objective, another useful strategy is to integrate or combine the multiple measurements into a single or 'composite' variable, using a pre-defined algorithm. Indeed, the primary variable sometimes arises as a combination of multiple clinical measurements (e.g. the rating scales used in arthritis, psychiatric disorders and elsewhere). This approach addresses the multiplicity problem without requiring adjustment to the type I error. The method of combining the multiple measurements should be specified in the protocol, and an interpretation of the resulting scale should be provided in terms of the size of a clinically relevant benefit. When a composite variable is used as a primary variable, the components of this variable may sometimes be analysed separately, where clinically meaningful and validated. When a rating scale is used as a primary variable, it is especially important to address such factors as content validity (see Glossary), inter- and intra-rater reliability (see Glossary) and responsiveness for detecting changes in the severity of disease.

2.2.4 Global Assessment Variables

In some cases, 'global assessment' variables (see Glossary) are developed to measure the overall safety, overall efficacy, and/or overall usefulness of a treatment. This type of variable integrates objective variables and the investigator’s overall impression about the state or change in the state of the subject, and is usually a scale of ordered categorical ratings. Global assessments of overall efficacy are well established in some therapeutic areas, such as neurology and psychiatry.

Global assessment variables generally have a subjective component. When a global assessment variable is used as a primary or secondary variable, fuller details of the scale should be included in the protocol with respect to:

1) the relevance of the scale to the primary objective of the trial;

2) the basis for the validity and reliability of the scale;

3) how to utilise the data collected on an individual subject to assign him/her to a unique category of the scale;

4) how to assign subjects with missing data to a unique category of the scale, or otherwise evaluate them.

If objective variables are considered by the investigator when making a global assessment, then those objective variables should be considered as additional primary, or at least important secondary, variables.

Global assessment of usefulness integrates components of both benefit and risk and reflects the decision making process of the treating physician, who must weigh benefit and risk in making product use decisions. A problem with global usefulness variables is that their use could in some cases lead to the result of two products being declared equivalent despite having very different profiles of beneficial and adverse effects. For example, judging the global usefulness of a treatment as equivalent or superior to an alternative may mask the fact that it has little or no efficacy but fewer adverse effects. Therefore it is not advisable to use a global usefulness variable as a primary variable. If global usefulness is specified as primary, it is important to consider specific efficacy and safety outcomes separately as additional primary variables.

2.2.5 Multiple Primary Variables

It may sometimes be desirable to use more than one primary variable, each of which (or a subset of which) could be sufficient to cover the range of effects of the therapies. The planned manner of interpretation of this type of evidence should be carefully spelled out. It should be clear whether an impact on any of the variables, some minimum number of them, or all of them, would be considered necessary to achieve the trial objectives. The primary hypothesis or hypotheses and parameters of interest (e.g. mean, percentage, distribution) should be clearly stated with respect to the primary variables identified, and the approach to statistical inference described. The effect on the type I error should be explained because of the potential for multiplicity problems (see Section 5.6); the method of controlling type I error should be given in the protocol. The extent of intercorrelation among the proposed primary variables may be considered in evaluating the impact on type I error. If the purpose of the trial is to demonstrate effects on all of the designated primary variables, then there is no need for adjustment of the type I error, but the impact on type II error and sample size should be carefully considered.

2.2.6 Surrogate Variables

When direct assessment of the clinical benefit to the subject through observing actual clinical efficacy is not practical, indirect criteria (surrogate variables - see Glossary) may be considered. Commonly accepted surrogate variables are used in a number of indications where they are believed to be reliable predictors of clinical benefit. There are two principal concerns with the introduction of any proposed surrogate variable. First, it may not be a true predictor of the clinical outcome of interest. For example it may measure treatment activity associated with one specific pharmacological mechanism, but may not provide full information on the range of actions and ultimate effects of the treatment, whether positive or negative. There have been many instances where treatments showing a highly positive effect on a proposed surrogate have ultimately been shown to be detrimental to the subjects' clinical outcome; conversely, there are cases of treatments conferring clinical benefit without measurable impact on proposed surrogates. Secondly, proposed surrogate variables may not yield a quantitative measure of clinical benefit that can be weighed directly against adverse effects. Statistical criteria for validating surrogate variables have been proposed but the experience with their use is relatively limited. In practice, the strength of the evidence for surrogacy depends upon (i) the biological plausibility of the relationship, (ii) the demonstration in epidemiological studies of the prognostic value of the surrogate for the clinical outcome and (iii) evidence from clinical trials that treatment effects on the surrogate correspond to effects on the clinical outcome. Relationships between clinical and surrogate variables for one product do not necessarily apply to a product with a different mode of action for treating the same disease.

2.2.7 Categorised Variables

Dichotomisation or other categorisation of continuous or ordinal variables may sometimes be desirable. Criteria of 'success' and 'response' are common examples of dichotomies which require precise specification in terms of, for example, a minimum percentage improvement (relative to baseline) in a continuous variable, or a ranking categorised as at or above some threshold level (e.g., 'good') on an ordinal rating scale.

The reduction of diastolic blood pressure below 90mmHg is a common dichotomisation. Categorisations are most useful when they have clear clinical relevance. The criteria for categorisation should be pre-defined and specified in the protocol, as knowledge of trial results could easily bias the choice of such criteria. Because categorisation normally implies a loss of information, a consequence will be a loss of power in the analysis; this should be accounted for in the sample size calculation.

2.3 Design Techniques to Avoid Bias

The most important design techniques for avoiding bias in clinical trials are blinding and randomisation, and these should be normal features of most controlled clinical trials intended to be included in a marketing application. Most such trials follow a double-blind approach in which treatments are pre-packed in accordance with a suitable randomisation schedule, and supplied to the trial centre(s) labelled only with the subject number and the treatment period so that no one involved in the conduct of the trial is aware of the specific treatment allocated to any particular subject, not even as a code letter. This approach will be assumed in Section 2.3.1 and most of Section 2.3.2, exceptions being considered at the end.

Bias can also be reduced at the design stage by specifying procedures in the protocol aimed at minimising any anticipated irregularities in trial conduct that might impair a satisfactory analysis, including various types of protocol violations, withdrawals and missing values. The protocol should consider ways both to reduce the frequency of such problems, and also to handle the problems that do occur in the analysis of data.

2.3.1 Blinding

Blinding or masking is intended to limit the occurrence of conscious and unconscious bias in the conduct and interpretation of a clinical trial arising from the influence which the knowledge of treatment may have on the recruitment and allocation of subjects, their subsequent care, the attitudes of subjects to the treatments, the assessment of end-points, the handling of withdrawals, the exclusion of data from analysis, and so on. The essential aim is to prevent identification of the treatments until all such opportunities for bias have passed.

Difficulties in achieving the double-blind ideal can arise: the treatments may be of a completely different nature, for example, surgery and drug therapy; two drugs may have different formulations and, although they could be made indistinguishable by the use of capsules, changing the formulation might also change the pharmacokinetic and/or pharmacodynamic properties and hence require that bioequivalence of the formulations be established; the daily pattern of administration of two treatments may differ. One way of achieving double-blind conditions under these circumstances is to use a 'double-dummy' (see Glossary) technique. This technique may sometimes force an administration scheme that is sufficiently unusual to influence adversely the motivation and compliance of the subjects. Ethical difficulties may also interfere with its use when, for example, it entails dummy operative procedures. Nevertheless, extensive efforts should be made to overcome these difficulties.

In this document, the blind review (see Glossary) of data refers to the checking of data during the period of time between trial completion (the last observation on the last subject) and the breaking of the blind.

2.3.2 Randomisation

Randomisation introduces a deliberate element of chance into the assignment of treatments to subjects in a clinical trial. During subsequent analysis of the trial data, it provides a sound statistical basis for the quantitative evaluation of the evidence relating to treatment effects. It also tends to produce treatment groups in which the distributions of prognostic factors, known and unknown, are similar. In combination with blinding, randomisation helps to avoid possible bias in the selection and allocation of subjects arising from the predictability of treatment assignments.

The randomisation schedule of a clinical trial documents the random allocation of treatments to subjects. In the simplest situation it is a sequential list of treatments (or treatment sequences in a crossover trial) or corresponding codes by subject number. The logistics of some trials, such as those with a screening phase, may make matters more complicated, but the unique pre-planned assignment of treatment, or treatment sequence, to subject should be clear. Different trial designs will require different procedures for generating randomisation schedules. The randomisation schedule should be reproducible (if the need arises).

Although unrestricted randomisation is an acceptable approach, some advantages can generally be gained by randomising subjects in blocks. This helps to increase the comparability of the treatment groups, particularly when subject characteristics may change over time, as a result, for example, of changes in recruitment policy. It also provides a better guarantee that the treatment groups will be of nearly equal size. In crossover trials it provides the means of obtaining balanced designs with their greater efficiency and easier interpretation. Care should be taken to choose block lengths that are sufficiently short to limit possible imbalance, but that are long enough to avoid predictability towards the end of the sequence in a block. Investigators and other relevant staff should generally be blind to the block length; the use of two or more block lengths, randomly selected for each block, can achieve the same purpose. (Theoretically, in a double-blind trial predictability does not matter, but the pharmacological effects of drugs may provide the opportunity for intelligent guesswork.)

In multicentre trials (see Glossary) the randomisation procedures should be organised centrally. It is advisable to have a separate random scheme for each centre, i.e. to stratify by centre or to allocate several whole blocks to each centre. More generally, stratification by important prognostic factors measured at baseline (e.g. severity of disease, age, sex, etc.) may sometimes be valuable in order to promote balanced allocation within strata; this has greater potential benefit in small trials. The use of more than two or three stratification factors is rarely necessary, is less successful at achieving balance and is logistically troublesome. The use of a dynamic allocation procedure (see below) may help to achieve balance across a number of stratification factors simultaneously provided the rest of the trial procedures can be adjusted to accommodate an approach of this type. Factors on which randomisation has been stratified should be accounted for later in the analysis.

The next subject to be randomised into a trial should always receive the treatment corresponding to the next free number in the appropriate randomisation schedule (in the respective stratum, if randomisation is stratified). The appropriate number and associated treatment for the next subject should only be allocated when entry of that subject to the randomised part of the trial has been confirmed. Details of the randomisation that facilitate predictability (e.g. block length) should not be contained in the trial protocol. The randomisation schedule itself should be filed securely by the sponsor or an independent party in a manner that ensures that blindness is properly maintained throughout the trial. Access to the randomisation schedule during the trial should take into account the possibility that, in an emergency, the blind may have to be broken for any subject. The procedure to be followed, the necessary documentation, and the subsequent treatment and assessment of the subject should all be described in the protocol.

Dynamic allocation is an alternative procedure in which the allocation of treatment to a subject is influenced by the current balance of allocated treatments and, in a stratified trial, by the stratum to which the subject belongs and the balance within that stratum. Deterministic dynamic allocation procedures should be avoided and an appropriate element of randomisation should be incorporated for each treatment allocation. Every effort should be made to retain the double-blind status of the trial. For example, knowledge of the treatment code may be restricted to a central trial office from where the dynamic allocation is controlled, generally through telephone contact. This in turn permits additional checks of eligibility criteria and establishes entry into the trial, features that can be valuable in certain types of multicentre trial. The usual system of pre-packing and labelling drug supplies for double-blind trials can then be followed, but the order of their use is no longer sequential. It is desirable to use appropriate computer algorithms to keep personnel at the central trial office blind to the treatment code. The complexity of the logistics and potential impact on the analysis should be carefully evaluated when considering dynamic allocation.

III. TRIAL DESIGN CONSIDERATIONS

3.1 Design Configuration

3.1.1 Parallel Group Design

The most common clinical trial design for confirmatory trials is the parallel group design in which subjects are randomised to one of two or more arms, each arm being allocated a different treatment. These treatments will include the investigational product at one or more doses, and one or more control treatments, such as placebo and/or an active comparator. The assumptions underlying this design are less complex than for most other designs. However, as with other designs, there may be additional features of the trial that complicate the analysis and interpretation (e.g. covariates, repeated measurements over time, interactions between design factors, protocol violations, dropouts (see Glossary) and withdrawals).

3.1.2 Crossover Design

In the crossover design, each subject is randomised to a sequence of two or more treatments, and hence acts as his own control for treatment comparisons. This simple manoeuvre is attractive primarily because it reduces the number of subjects and usually the number of assessments needed to achieve a specific power, sometimes to a marked extent. In the simplest 2×2 crossover design each subject receives each of two treatments in randomised order in two successive treatment periods, often separated by a washout period. The most common extension of this entails comparing n(>2) treatments in n periods, each subject receiving all n treatments. Numerous variations exist, such as designs in which each subject receives a subset of n(>2) treatments, or ones in which treatments are repeated within a subject.

Crossover designs have a number of problems that can invalidate their results. The chief difficulty concerns carryover, that is, the residual influence of treatments in subsequent treatment periods. In an additive model the effect of unequal carryover will be to bias direct treatment comparisons. In the 2×2 design the carryover effect cannot be statistically distinguished from the interaction between treatment and period and the test for either of these effects lacks power because the corresponding contrast is 'between subject'. This problem is less acute in higher order designs, but cannot be entirely dismissed.

When the crossover design is used it is therefore important to avoid carryover. This is best done by selective and careful use of the design on the basis of adequate knowledge of both the disease area and the new medication. The disease under study should be chronic and stable. The relevant effects of the medication should develop fully within the treatment period. The washout periods should be sufficiently long for complete reversibility of drug effect. The fact that these conditions are likely to be met should be established in advance of the trial by means of prior information and data.

There are additional problems that need careful attention in crossover trials. The most notable of these are the complications of analysis and interpretation arising from the loss of subjects. Also, the potential for carryover leads to difficulties in assigning adverse events which occur in later treatment periods to the appropriate treatment. These, and other issues, are described in ICH E4. The crossover design should generally be restricted to situations where losses of subjects from the trial are expected to be small.

A common, and generally satisfactory, use of the 2×2 crossover design is to demonstrate the bioequivalence of two formulations of the same medication. In this particular application in healthy volunteers, carryover effects on the relevant pharmacokinetic variable are most unlikely to occur if the wash-out time between the two periods is sufficiently long. However it is still important to check this assumption during analysis on the basis of the data obtained, for example by demonstrating that no drug is detectable at the start of each period.

3.1.3 Factorial Designs

In a factorial design two or more treatments are evaluated simultaneously through the use of varying combinations of the treatments. The simplest example is the 2×2 factorial design in which subjects are randomly allocated to one of the four possible combinations of two treatments, A and B say. These are: A alone; B alone; both A and B; neither A nor B. In many cases this design is used for the specific purpose of examining the interaction of A and B. The statistical test of interaction may lack power to detect an interaction if the sample size was calculated based on the test for main effects. This consideration is important when this design is used for examining the joint effects of A and B, in particular, if the treatments are likely to be used together.

Another important use of the factorial design is to establish the dose-response characteristics of the simultaneous use of treatments C and D, especially when the efficacy of each monotherapy has been established at some dose in prior trials. A number, m, of doses of C is selected, usually including a zero dose (placebo), and a similar number, n, of doses of D. The full design then consists of m×n treatment groups, each receiving a different combination of doses of C and D. The resulting estimate of the response surface may then be used to help to identify an appropriate combination of doses of C and D for clinical use (see ICH E4).

In some cases, the 2×2 design may be used to make efficient use of clinical trial subjects by evaluating the efficacy of the two treatments with the same number of subjects as would be required to evaluate the efficacy of either one alone. This strategy has proved to be particularly valuable for very large mortality trials. The efficiency and validity of this approach depends upon the absence of interaction between treatments A and B so that the effects of A and B on the primary efficacy variables follow an additive model, and hence the effect of A is virtually identical whether or not it is additional to the effect of B. As for the crossover trial, evidence that this condition is likely to be met should be established in advance of the trial by means of prior information and data.

3.2 Multicentre Trials

Multicentre trials are carried out for two main reasons. Firstly, a multicentre trial is an accepted way of evaluating a new medication more efficiently; under some circumstances, it may present the only practical means of accruing sufficient subjects to satisfy the trial objective within a reasonable time-frame. Multicentre trials of this nature may, in principle, be carried out at any stage of clinical development. They may have several centres with a large number of subjects per centre or, in the case of a rare disease, they may have a large number of centres with very few subjects per centre.

Secondly, a trial may be designed as a multicentre (and multi-investigator) trial primarily to provide a better basis for the subsequent generalisation of its findings.

This arises from the possibility of recruiting the subjects from a wider population and of administering the medication in a broader range of clinical settings, thus presenting an experimental situation that is more typical of future use. In this case the involvement of a number of investigators also gives the potential for a wider range of clinical judgement concerning the value of the medication. Such a trial would be a confirmatory trial in the later phases of drug development and would be likely to involve a large number of investigators and centres. It might sometimes be conducted in a number of different countries in order to facilitate generalisability (see Glossary) even further.

If a multicentre trial is to be meaningfully interpreted and extrapolated, then the manner in which the protocol is implemented should be clear and similar at all centres. Furthermore the usual sample size and power calculations depend upon the assumption that the differences between the compared treatments in the centres are unbiased estimates of the same quantity. It is important to design the common protocol and to conduct the trial with this background in mind. Procedures should be standardised as completely as possible. Variation of evaluation criteria and schemes can be reduced by investigator meetings, by the training of personnel in advance of the trial and by careful monitoring during the trial. Good design should generally aim to achieve the same distribution of subjects to treatments within each centre and good management should maintain this design objective. Trials that avoid excessive variation in the numbers of subjects per centre and trials that avoid a few very small centres have advantages if it is later found necessary to take into account the heterogeneity of the treatment effect from centre to centre, because they reduce the differences between different weighted estimates of the treatment effect. (This point does not apply to trials in which all centres are very small and in which centre does not feature in the analysis.) Failure to take these precautions, combined with doubts about the homogeneity of the results may, in severe cases, reduce the value of a multicentre trial to such a degree that it cannot be regarded as giving convincing evidence for the sponsor’s claims.

In the simplest multicentre trial, each investigator will be responsible for the subjects recruited at one hospital, so that ‘centre’ is identified uniquely by either investigator or hospital. In many trials, however, the situation is more complex. One investigator may recruit subjects from several hospitals; one investigator may represent a team of clinicians (subinvestigators) who all recruit subjects from their own clinics at one hospital or at several associated hospitals. Whenever there is room for doubt about the definition of centre in a statistical model, the statistical section of the protocol (see Section 5.1) should clearly define the term (e.g. by investigator, location or region) in the context of the particular trial. In most instances centres can be satisfactorily defined through the investigators and ICH E6 provides relevant guidance in this respect. In cases of doubt the aim should be to define centres so as to achieve homogeneity in the important factors affecting the measurements of the primary variables and the influence of the treatments. Any rules for combining centres in the analysis should be justified and specified prospectively in the protocol where possible, but in any case decisions concerning this approach should always be taken blind to treatment, for example at the time of the blind review.

The statistical model to be adopted for the estimation and testing of treatment effects should be described in the protocol. The main treatment effect may be investigated first using a model which allows for centre differences, but does not include a term for treatment-by-centre interaction. If the treatment effect is homogeneous across centres, the routine inclusion of interaction terms in the model reduces the efficiency of the test for the main effects. In the presence of true heterogeneity of treatment effects, the interpretation of the main treatment effect is controversial.

In some trials, for example some large mortality trials with very few subjects per centre, there may be no reason to expect the centres to have any influence on the primary or secondary variables because they are unlikely to represent influences of clinical importance. In other trials it may be recognised from the start that the limited numbers of subjects per centre will make it impracticable to include the centre effects in the statistical model. In these cases it is not appropriate to include a term for centre in the model, and it is not necessary to stratify the randomisation by centre in this situation.

If positive treatment effects are found in a trial with appreciable numbers of subjects per centre, there should generally be an exploration of the heterogeneity of treatment effects across centres, as this may affect the generalisability of the conclusions. Marked heterogeneity may be identified by graphical display of the results of individual centres or by analytical methods, such as a significance test of the treatment-by-centre interaction. When using such a statistical significance test, it is important to recognise that this generally has low power in a trial designed to detect the main effect of treatment.

If heterogeneity of treatment effects is found, this should be interpreted with care and vigorous attempts should be made to find an explanation in terms of other features of trial management or subject characteristics. Such an explanation will usually suggest appropriate further analysis and interpretation. In the absence of an explanation, heterogeneity of treatment effect as evidenced, for example, by marked quantitative interactions (see Glossary) implies that alternative estimates of the treatment effect may be required, giving different weights to the centres, in order to substantiate the robustness of the estimates of treatment effect. It is even more important to understand the basis of any heterogeneity characterised by marked qualitative interactions (see Glossary), and failure to find an explanation may necessitate further clinical trials before the treatment effect can be reliably predicted.

Up to this point the discussion of multicentre trials has been based on the use of fixed effect models. Mixed models may also be used to explore the heterogeneity of the treatment effect. These models consider centre and treatment-by-centre effects to be random, and are especially relevant when the number of sites is large.

3.3 Type of Comparison

3.3.1 Trials to Show Superiority

Scientifically, efficacy is most convincingly established by demonstrating superiority to placebo in a placebo-controlled trial, by showing superiority to an active control treatment or by demonstrating a dose-response relationship. This type of trial is referred to as a ‘superiority’ trial (see Glossary). Generally in this guidance superiority trials are assumed, unless it is explicitly stated otherwise.

For serious illnesses, when a therapeutic treatment which has been shown to be efficacious by superiority trial(s) exists, a placebo-controlled trial may be considered unethical. In that case the scientifically sound use of an active treatment as a control should be considered. The appropriateness of placebo control vs. active control should be considered on a trial by trial basis.

3.3.2 Trials to Show Equivalence or Non-inferiority

In some cases, an investigational product is compared to a reference treatment without the objective of showing superiority. This type of trial is divided into two major categories according to its objective; one is an 'equivalence' trial (see Glossary) and the other is a 'non-inferiority' trial (see Glossary).

Many active control trials are designed to show that the efficacy of an investigational product is no worse than that of the active comparator, and hence fall into the latter category. Another possibility is a trial in which multiple doses of the investigational drug are compared with the recommended dose or multiple doses of the standard drug. The purpose of this design is simultaneously to show a dose-response relationship for the investigational product and to compare the investigational product with the active control.

Active control equivalence or non-inferiority trials may also incorporate a placebo, thus pursuing multiple goals in one trial; for example, they may establish superiority to placebo and hence validate the trial design and simultaneously evaluate the degree of similarity of efficacy and safety to the active comparator. There are well known difficulties associated with the use of the active control equivalence (or non-inferiority) trials that do not incorporate a placebo or do not use multiple doses of the new drug. These relate to the implicit lack of any measure of internal validity (in contrast to superiority trials), thus making external validation necessary. The equivalence (or non-inferiority) trial is not conservative in nature, so that many flaws in the design or conduct of the trial will tend to bias the results towards a conclusion of equivalence. For these reasons, the design features of such trials should receive special attention and their conduct needs special care. For example, it is especially important to minimise the incidence of violations of the entry criteria, non-compliance, withdrawals, losses to follow-up, missing data and other deviations from the protocol, and also to minimise their impact on the subsequent analyses.

Active comparators should be chosen with care. An example of a suitable active comparator would be a widely used therapy whose efficacy in the relevant indication has been clearly established and quantified in well designed and well documented superiority trial(s) and which can be reliably expected to exhibit similar efficacy in the contemplated active control trial. To this end, the new trial should have the same important design features (primary variables, the dose of the active comparator, eligibility criteria, etc.) as the previously conducted superiority trials in which the active comparator clearly demonstrated clinically relevant efficacy, taking into account advances in medical or statistical practice relevant to the new trial.

It is vital that the protocol of a trial designed to demonstrate equivalence or non-inferiority contain a clear statement that this is its explicit intention. An equivalence margin should be specified in the protocol; this margin is the largest difference that can be judged as being clinically acceptable and should be smaller than differences observed in superiority trials of the active comparator. For the active control equivalence trial, both the upper and the lower equivalence margins are needed, while only the lower margin is needed for the active control non-inferiority trial. The choice of equivalence margins should be justified clinically.

Statistical analysis is generally based on the use of confidence intervals (see Section 5.5). For equivalence trials, two-sided confidence intervals should be used. Equivalence is inferred when the entire confidence interval falls within the equivalence margins. Operationally, this is equivalent to the method of using two simultaneous one-sided tests to test the (composite) null hypothesis that the treatment difference is outside the equivalence margins versus the (composite) alternative hypothesis that the treatment difference is within the margins. Because the two null hypotheses are disjoint, the type I error is appropriately controlled. For non-inferiority trials a one-sided interval should be used. The confidence interval approach has a one-sided hypothesis test counterpart for testing the null hypothesis that the treatment difference (investigational product minus control) is equal to the lower equivalence margin versus the alternative that the treatment difference is greater than the lower equivalence margin. The choice of type I error should be a consideration separate from the use of a one-sided or two-sided procedure. Sample size calculations should be based on these methods (see Section 3.5).

Concluding equivalence or non-inferiority based on observing a non-significant test result of the null hypothesis that there is no difference between the investigational product and the active comparator is inappropriate.

There are also special issues in the choice of analysis sets. Subjects who withdraw or dropout of the treatment group or the comparator group will tend to have a lack of response, and hence the results of using the full analysis set (see Glossary) may be biased toward demonstrating equivalence (see Section 5.2.3).

3.3.3 Trials to Show Dose-response Relationship

How response is related to the dose of a new investigational product is a question to which answers may be obtained in all phases of development, and by a variety of approaches (see ICH E4). Dose-response trials may serve a number of objectives, amongst which the following are of particular importance: the confirmation of efficacy; the investigation of the shape and location of the dose-response curve; the estimation of an appropriate starting dose; the identification of optimal strategies for individual dose adjustments; the determination of a maximal dose beyond which additional benefit would be unlikely to occur. These objectives should be addressed using the data collected at a number of doses under investigation, including a placebo (zero dose) wherever appropriate. For this purpose the application of procedures to estimate the relationship between dose and response, including the construction of confidence intervals and the use of graphical methods, is as important as the use of statistical tests. The hypothesis tests that are used may need to be tailored to the natural ordering of doses or to particular questions regarding the shape of the dose-response curve (e.g. monotonicity). The details of the planned statistical procedures should be given in the protocol.

3.4 Group Sequential Designs

Group sequential designs are used to facilitate the conduct of interim analysis (see section 4.5 and Glossary). While group sequential designs are not the only acceptable types of designs permitting interim analysis, they are the most commonly applied because it is more practicable to assess grouped subject outcomes at periodic intervals during the trial than on a continuous basis as data from each subject become available. The statistical methods should be fully specified in advance of the availability of information on treatment outcomes and subject treatment assignments (i.e. blind breaking, see Section 4.5). An Independent Data Monitoring Committee (see Glossary) may be used to review or to conduct the interim analysis of data arising from a group sequential design (see Section 4.6). While the design has been most widely and successfully used in large, long-term trials of mortality or major non-fatal endpoints, its use is growing in other circumstances. In particular, it is recognised that safety must be monitored in all trials and therefore the need for formal procedures to cover early stopping for safety reasons should always be considered.

3.5 Sample Size

The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed. This number is usually determined by the primary objective of the trial. If the sample size is determined on some other basis, then this should be made clear and justified. For example, a trial sized on the basis of safety questions or requirements or important secondary objectives may need larger numbers of subjects than a trial sized on the basis of the primary efficacy question (see, for example, ICH E1a).

Using the usual method for determining the appropriate sample size, the following items should be specified: a primary variable, the test statistic, the null hypothesis, the alternative ('working') hypothesis at the chosen dose(s) (embodying consideration of the treatment difference to be detected or rejected at the dose and in the subject population selected), the probability of erroneously rejecting the null hypothesis (the type I error), and the probability of erroneously failing to reject the null hypothesis (the type II error), as well as the approach to dealing with treatment withdrawals and protocol violations. In some instances, the event rate is of primary interest for evaluating power, and assumptions should be made to extrapolate from the required number of events to the eventual sample size for the trial.

The method by which the sample size is calculated should be given in the protocol, together with the estimates of any quantities used in the calculations (such as variances, mean values, response rates, event rates, difference to be detected). The basis of these estimates should also be given. It is important to investigate the sensitivity of the sample size estimate to a variety of deviations from these assumptions and this may be facilitated by providing a range of sample sizes appropriate for a reasonable range of deviations from assumptions. In confirmatory trials, assumptions should normally be based on published data or on the results of earlier trials. The treatment difference to be detected may be based on a judgement concerning the minimal effect which has clinical relevance in the management of patients or on a judgement concerning the anticipated effect of the new treatment, where this is larger. Conventionally the probability of type I error is set at 5% or less or as dictated by any adjustments made necessary for multiplicity considerations; the precise choice may be influenced by the prior plausibility of the hypothesis under test and the desired impact of the results. The probability of type II error is conventionally set at 10% to 20%; it is in the sponsor’s interest to keep this figure as low as feasible especially in the case of trials that are difficult or impossible to repeat. Alternative values to the conventional levels of type I and type II error may be acceptable or even preferable in some cases.

Sample size calculations should refer to the number of subjects required for the primary analysis. If this is the 'full analysis set', estimates of the effect size may need to be reduced compared to the per protocol set (see Glossary). This is to allow for the dilution of the treatment effect arising from the inclusion of data from patients who have withdrawn from treatment or whose compliance is poor. The assumptions about variability may also need to be revised.

The sample size of an equivalence trial or a non-inferiority trial (see Section 3.3.2) should normally be based on the objective of obtaining a confidence interval for the treatment difference that shows that the treatments differ at most by a clinically acceptable difference. When the power of an equivalence trial is assessed at a true difference of zero, then the sample size necessary to achieve this power is underestimated if the true difference is not zero. When the power of a non-inferiority trial is assessed at a zero difference, then the sample size needed to achieve that power will be underestimated if the effect of the investigational product is less than that of the active control. The choice of a 'clinically acceptable’ difference needs justification with respect to its meaning for future patients, and may be smaller than the 'clinically relevant' difference referred to above in the context of superiority trials designed to establish that a difference exists.

The exact sample size in a group sequential trial cannot be fixed in advance because it depends upon the play of chance in combination with the chosen stopping guideline and the true treatment difference. The design of the stopping guideline should take into account the consequent distribution of the sample size, usually embodied in the expected and maximum sample sizes.

When event rates are lower than anticipated or variability is larger than expected, methods for sample size re-estimation are available without unblinding data or making treatment comparisons (see Section 4.4).

3.6 Data Capture and Processing

The collection of data and transfer of data from the investigator to the sponsor can take place through a variety of media, including paper case record forms, remote site monitoring systems, medical computer systems and electronic transfer. Whatever data capture instrument is used, the form and content of the information collected should be in full accordance with the protocol and should be established in advance of the conduct of the clinical trial. It should focus on the data necessary to implement the planned analysis, including the context information (such as timing assessments relative to dosing) necessary to confirm protocol compliance or identify important protocol deviations. ‘Missing values’ should be distinguishable from the ‘value zero’ or ‘characteristic absent’.

The process of data capture through to database finalisation should be carried out in accordance with GCP (see ICH E6, Section 5). Specifically, timely and reliable processes for recording data and rectifying errors and omissions are necessary to ensure delivery of a quality database and the achievement of the trial objectives through the implementation of the planned analysis.

IV. TRIAL CONDUCT CONSIDERATIONS

4.1 Trial Monitoring and Interim Analysis

Careful conduct of a clinical trial according to the protocol has a major impact on the credibility of the results (see ICH E6). Careful monitoring can ensure that difficulties are noticed early and their occurrence or recurrence minimised.

There are two distinct types of monitoring that generally characterise confirmatory clinical trials sponsored by the pharmaceutical industry. One type of monitoring concerns the oversight of the quality of the trial, while the other type involves breaking the blind to make treatment comparisons (i.e. interim analysis). Both types of trial monitoring, in addition to entailing different staff responsibilities, involve access to different types of trial data and information, and thus different principles apply for the control of potential statistical and operational bias.

For the purpose of overseeing the quality of the trial the checks involved in trial monitoring may include whether the protocol is being followed, the acceptability of data being accrued, the success of planned accrual targets, the appropriateness of the design assumptions, success in keeping patients in the trials, etc. (see Sections 4.2 to 4.4). This type of monitoring does not require access to information on comparative treatment effects, nor unblinding of data and therefore has no impact on type I error. The monitoring of a trial for this purpose is the responsibility of the sponsor (see ICH E6) and can be carried out by the sponsor or an independent group selected by the sponsor. The period for this type of monitoring usually starts with the selection of the trial sites and ends with the collection and cleaning of the last subject’s data.

The other type of trial monitoring (interim analysis) involves the accruing of comparative treatment results. Interim analysis requires unblinded (i.e. key breaking) access to treatment group assignment (actual treatment assignment or identification of group assignment) and comparative treatment group summary information. This necessitates that the protocol (or appropriate amendments prior to a first analysis) contains statistical plans for the interim analysis to prevent certain types of bias. This is discussed in Sections 4.5 & 4.6.

4.2 Changes in Inclusion and Exclusion Criteria

Inclusion and exclusion criteria should remain constant, as specified in the protocol, throughout the period of subject recruitment. Changes may occasionally be appropriate, for example, in long term trials, where growing medical knowledge either from outside the trial or from interim analyses may suggest a change of entry criteria. Changes may also result from the discovery by monitoring staff that regular violations of the entry criteria are occurring, or that seriously low recruitment rates are due to over-restrictive criteria. Changes should be made without breaking the blind and should always be described by a protocol amendment which should cover any statistical consequences, such as sample size adjustments arising from different event rates, or modifications to the planned analysis, such as stratifying the analysis according to modified inclusion/exclusion criteria.

4.3 Accrual Rates

In trials with a long time-scale for the accrual of subjects, the rate of accrual should be monitored and, if it falls appreciably below the projected level, the reasons should be identified and remedial actions taken in order to protect the power of the trial and alleviate concerns about selective entry and other aspects of quality. In a multicentre trial these considerations apply to the individual centres.

4.4 Sample Size Adjustment

In long term trials there will usually be an opportunity to check the assumptions which underlay the original design and sample size calculations. This may be particularly important if the trial specifications have been made on preliminary and/or uncertain information. An interim check conducted on the blinded data may reveal that overall response variances, event rates or survival experience are not as anticipated. A revised sample size may then be calculated using suitably modified assumptions, and should be justified and documented in a protocol amendment and in the clinical study report. The steps taken to preserve blindness and the consequences, if any, for the type I error and the width of confidence intervals should be explained. The potential need for re-estimation of the sample size should be envisaged in the protocol whenever possible (see Section 3.5).

4.5 Interim Analysis and Early Stopping

An interim analysis is any analysis intended to compare treatment arms with respect to efficacy or safety at any time prior to formal completion of a trial. Because the number, methods and consequences of these comparisons affect the interpretation of the trial, all interim analyses should be carefully planned in advance and described in the protocol. Special circumstances may dictate the need for an interim analysis that was not defined at the start of a trial. In these cases, a protocol amendment describing the interim analysis should be completed prior to unblinded access to treatment comparison data. When an interim analysis is planned with the intention of deciding whether or not to terminate a trial, this is usually accomplished by the use of a group sequential design which employs statistical monitoring schemes as guidelines (see Section 3.4). The goal of such an interim analysis is to stop the trial early if the superiority of the treatment under study is clearly established, if the demonstration of a relevant treatment difference has become unlikely or if unacceptable adverse effects are apparent. Generally, boundaries for monitoring efficacy require more evidence to terminate a trial early (i.e. they are more conservative) than boundaries for monitoring safety. When the trial design and monitoring objective involve multiple endpoints then this aspect of multiplicity may also need to be taken into account.

The protocol should describe the schedule of interim analyses, or at least the considerations which will govern its generation, for example if flexible alpha spending function approaches are to be employed; further details may be given in a protocol amendment before the time of the first interim analysis. The stopping guidelines and their properties should be clearly described in the protocol or amendments. The potential effects of early stopping on the analysis of other important variables should also be considered. This material should be written or approved by the Data Monitoring Committee (see Section 4.6), when the trial has one. Deviations from the planned procedure always bear the potential of invalidating the trial results. If it becomes necessary to make changes to the trial, any consequent changes to the statistical procedures should be specified in an amendment to the protocol at the earliest opportunity, especially discussing the impact on any analysis and inferences that such changes may cause. The procedures selected should always ensure that the overall probability of type I error is controlled.

The execution of an interim analysis should be a completely confidential process because unblinded data and results are potentially involved. All staff involved in the conduct of the trial should remain blind to the results of such analyses, because of the possibility that their attitudes to the trial will be modified and cause changes in the characteristics of patients to be recruited or biases in treatment comparisons. This principle may be applied to all investigator staff and to staff employed by the sponsor except for those who are directly involved in the execution of the interim analysis. Investigators should only be informed about the decision to continue or to discontinue the trial, or to implement modifications to trial procedures.

Most clinical trials intended to support the efficacy and safety of an investigational product should proceed to full completion of planned sample size accrual; trials should be stopped early only for ethical reasons or if the power is no longer acceptable. However, it is recognised that drug development plans involve the need for sponsor access to comparative treatment data for a variety of reasons, such as planning other trials. It is also recognised that only a subset of trials will involve the study of serious life-threatening outcomes or mortality which may need sequential monitoring of accruing comparative treatment effects for ethical reasons. In either of these situations, plans for interim statistical analysis should be in place in the protocol or in protocol amendments prior to the unblinded access to comparative treatment data in order to deal with the potential statistical and operational bias that may be introduced.

For many clinical trials of investigational products, especially those that have major public health significance, the responsibility for monitoring comparisons of efficacy and/or safety outcomes should be assigned to an external independent group, often called an Independent Data Monitoring Committee (IDMC), a Data and Safety Monitoring Board or a Data Monitoring Committee whose responsibilities should be clearly described.

Any interim analysis that is not planned appropriately (with or without the consequences of stopping the trial early) may flaw the results of a trial and possibly weaken confidence in the conclusions drawn. Therefore, such analyses should be avoided. If unplanned interim analysis is conducted, the clinical study report should explain why it was necessary, the degree to which blindness had to be broken, provide an assessment of the potential magnitude of bias introduced, and the impact on the interpretation of the results.

4.6 Role of Independent Data Monitoring Committee (IDMC) (see Sections 1.25 and 5.52 of ICH E6)

An IDMC may be established by the sponsor to assess at intervals the progress of a clinical trial, safety data, and critical efficacy variables and recommend to the sponsor whether to continue, modify or terminate a trial. The IDMC should have written operating procedures and maintain records of all its meetings, including interim results; these should be available for review when the trial is complete. The independence of the IDMC is intended to control the sharing of important comparative information and to protect the integrity of the clinical trial from adverse impact resulting from access to trial information. The IDMC is a separate entity from an Institutional Review Board (IRB) or an Independent Ethics Committee (IEC), and its composition should include clinical trial scientists knowledgeable in the appropriate disciplines including statistics.

When there are sponsor representatives on the IDMC, their role should be clearly defined in the operating procedures of the committee (for example, covering whether or not they can vote on key issues). Since these sponsor staff would have access to unblinded information, the procedures should also address the control of dissemination of interim trial results within the sponsor organisation.

V. DATA ANALYSIS CONSIDERATIONS

5.1 Prespecification of the Analysis

When designing a clinical trial the principal features of the eventual statistical analysis of the data should be described in the statistical section of the protocol. This section should include all the principal features of the proposed confirmatory analysis of the primary variable(s) and the way in which anticipated analysis problems will be handled. In case of exploratory trials this section could describe more general principles and directions.

The statistical analysis plan (see Glossary) may be written as a separate document to be completed after finalising the protocol. In this document, a more technical and detailed elaboration of the principal features stated in the protocol may be included (see section 7.1). The plan may include detailed procedures for executing the statistical analysis of the primary and secondary variables and other data. The plan should be reviewed and possibly updated as a result of the blind review of the data (see 7.1 for definition) and should be finalised before breaking the blind. Formal records should be kept of when the statistical analysis plan was finalised as well as when the blind was subsequently broken.

In the statistical section of the clinical study report the statistical methodology should be clearly described including when in the clinical trial process methodology decisions were made (see ICH E3).

5.2 Analysis Sets

The set of subjects whose data are to be included in the main analyses should be defined in the statistical section of the protocol. In addition, documentation for all subjects for whom trial procedures (e.g. run-in period) were initiated may be useful. The content of this subject documentation depends on detailed features of the particular trial, but at least demographic and baseline data on disease status should be collected whenever possible.

If all subjects randomised into a clinical trial satisfied all entry criteria, followed all trial procedures perfectly with no losses to follow-up, and provided complete data records, then the set of subjects to be included in the analysis would be self-evident. The design and conduct of a trial should aim to approach this ideal as closely as possible, but, in practice, it is doubtful if it can ever be fully achieved. Hence, the statistical section of the protocol should address anticipated problems prospectively in terms of how these affect the subjects and data to be analysed. The protocol should also specify procedures aimed at minimising any anticipated irregularities in study conduct that might impair a satisfactory analysis, including various types of protocol violations, withdrawals and missing values. The protocol should consider ways both to reduce the frequency of such problems, and also to handle the problems that do occur in the analysis of data. Possible amendments to the way in which the analysis will deal with protocol violations should be identified during the blind review. It is desirable to identify any important protocol violation with respect to the time when it occurred, its cause and influence on the trial result. The frequency and type of protocol violations, missing values, and other problems should be documented in the clinical study report and their potential influence on the trial results should be described (see ICH E3).

Decisions concerning the analysis set should be guided by the following principles : 1) to minimise bias, and 2) to avoid inflation of type I error.

5.2.1 Full Analysis Set

The intention-to-treat (see Glossary) principle implies that the primary analysis should include all randomised subjects. Compliance with this principle would necessitate complete follow-up of all randomised subjects for study outcomes. In practice this ideal may be difficult to achieve, for reasons to be described. In this document the term 'full analysis set' is used to describe the analysis set which is as complete as possible and as close as possible to the intention-to-treat ideal of including all randomised subjects. Preservation of the initial randomisation in analysis is important in preventing bias and in providing a secure foundation for statistical tests. In many clinical trials the use of the full analysis set provides a conservative strategy. Under many circumstances it may also provide estimates of treatment effects which are more likely to mirror those observed in subsequent practice.

There are a limited number of circumstances that might lead to excluding randomised subjects from the full analysis set including the failure to satisfy major entry criteria (eligibility violations), the failure to take at least one dose of trial medication and the lack of any data post randomisation. Such exclusions should always be justified. Subjects who fail to satisfy an entry criterion may be excluded from the analysis without the possibility of introducing bias only under the following circumstances:

(i) the entry criterion was measured prior to randomisation;

(ii) the detection of the relevant eligibility violations can be made completely objectively;

(iii) all subjects receive equal scrutiny for eligibility violations; (This may be difficult to ensure in an open-label study, or even in a double-blind study if the data are unblinded prior to this scrutiny, emphasising the importance of the blind review.)

(iv) all detected violations of the particular entry criterion are excluded.

In some situations, it may be reasonable to eliminate from the set of all randomised subjects any subject who took no trial medication. The intention-to-treat principle would be preserved despite the exclusion of these patients provided, for example, that the decision of whether or not to begin treatment could not be influenced by knowledge of the assigned treatment. In other situations it may be necessary to eliminate from the set of all randomised subjects any subject without data post randomisation. No analysis is complete unless the potential biases arising from these specific exclusions, or any others, are addressed.

When the full analysis set of subjects is used, violations of the protocol that occur after randomisation may have an impact on the data and conclusions, particularly if their occurrence is related to treatment assignment. In most respects it is appropriate to include the data from such subjects in the analysis, consistent with the intention-to-treat principle. Special problems arise in connection with subjects withdrawn from treatment after receiving one or more doses who provide no data after this point, and subjects otherwise lost to follow-up, because failure to include these subjects in the full analysis set may seriously undermine the approach. Measurements of primary variables made at the time of the loss to follow-up of a subject for any reason, or subsequently collected in accordance with the intended schedule of assessments in the protocol, are valuable in this context; subsequent collection is especially important in studies where the primary variable is mortality or serious morbidity. The intention to collect data in this way should be described in the protocol. Imputation techniques, ranging from the carrying forward of the last observation to the use of complex mathematical models, may also be used in an attempt to compensate for missing data. Other methods employed to ensure the availability of measurements of primary variables for every subject in the full analysis set may require some assumptions about the subjects' outcomes or a simpler choice of outcome (e.g. success / failure). The use of any of these strategies should be described and justified in the statistical section of the protocol and the assumptions underlying any mathematical models employed should be clearly explained. It is also important to demonstrate the robustness of the corresponding results of analysis especially when the strategy in question could itself lead to biased estimates of treatment effects.

Because of the unpredictability of some problems, it may sometimes be preferable to defer detailed consideration of the manner of dealing with irregularities until the blind review of the data at the end of the trial, and, if so, this should be stated in the protocol.

5.2.2 Per Protocol Set

sample or the 'evaluable subjects' sample, defines a subset of the subjects in the full analysis set who are more compliant with the protocol and is characterised by criteria such as the following:

(i) the completion of a certain pre-specified minimal exposure to the treatment regimen;

(ii) the availability of measurements of the primary variable(s);

(iii) the absence of any major protocol violations including the violation of entry criteria.

The precise reasons for excluding subjects from the per protocol set should be fully defined and documented before breaking the blind in a manner appropriate to the circumstances of the specific trial.

The use of the per protocol set may maximise the opportunity for a new treatment to show additional efficacy in the analysis, and most closely reflects the scientific model underlying the protocol. However, the corresponding test of the hypothesis and estimate of the treatment effect may or may not be conservative depending on the trial; the bias, which may be severe, arises from the fact that adherence to the study protocol may be related to treatment and outcome.

The problems that lead to the exclusion of subjects to create the per protocol set, and other protocol violations, should be fully identified and summarised. Relevant protocol violations may include errors in treatment assignment, the use of excluded medication, poor compliance, loss to follow-up and missing data. It is good practice to assess the pattern of such problems among the treatment groups with respect to frequency and time to occurrence.

5.2.3 Roles of the Different Analysis Sets

In general, it is advantageous to demonstrate a lack of sensitivity of the principal trial results to alternative choices of the set of subjects analysed. In confirmatory trials it is usually appropriate to plan to conduct both an analysis of the full analysis set and a per protocol analysis, so that any differences between them can be the subject of explicit discussion and interpretation. In some cases, it may be desirable to plan further exploration of the sensitivity of conclusions to the choice of the set of subjects analysed. When the full analysis set and the per protocol set lead to essentially the same conclusions, confidence in the trial results is increased, bearing in mind, however, that the need to exclude a substantial proportion of subjects from the per protocol analysis throws some doubt on the overall validity of the trial.

The full analysis set and the per protocol set play different roles in superiority trials (which seek to show the investigational product to be superior), and in equivalence or non-inferiority trials (which seek to show the investigational product to be comparable, see section 3.3.2). In superiority trials the full analysis set is used in the primary analysis (apart from exceptional circumstances) because it tends to avoid over-optimistic estimates of efficacy resulting from a per protocol analysis, since the non-compliers included in the full analysis set will generally diminish the estimated treatment effect. However, in an equivalence or non-inferiority trial use of the full analysis set is generally not conservative and its role should be considered very carefully.

5.3 Missing Values and Outliers

Missing values represent a potential source of bias in a clinical trial. Hence, every effort should be undertaken to fulfil all the requirements of the protocol concerning the collection and management of data. In reality, however, there will almost always be some missing data. A trial may be regarded as valid, nonetheless, provided the methods of dealing with missing values are sensible, and particularly if those methods are pre-defined in the protocol. Definition of methods may be refined by updating this aspect in the statistical analysis plan during the blind review. Unfortunately, no universally applicable methods of handling missing values can be recommended. An investigation should be made concerning the sensitivity of the results of analysis to the method of handling missing values, especially if the number of missing values is substantial.

5.4 Data Transformation

The decision to transform key variables prior to analysis is best made during the design of the trial on the basis of similar data from earlier clinical trials. Transformations (e.g. square root, logarithm) should be specified in the protocol and a rationale provided, especially for the primary variable(s). The general principles guiding the use of transformations to ensure that the assumptions underlying the statistical methods are met are to be found in standard texts; conventions for particular variables have been developed in a number of specific clinical areas. The decision on whether and how to transform a variable should be influenced by the preference for a scale which facilitates clinical interpretation.

Similar considerations apply to other derived variables, such as the use of change from baseline, percentage change from baseline, the 'area under the curve' of repeated measures, or the ratio of two different variables. Subsequent clinical interpretation should be carefully considered, and the derivation should be justified in the protocol. Closely related points are made in Section 2.2.2.

5.5 Estimation, Confidence Intervals and Hypothesis Testing

The statistical section of the protocol should specify the hypotheses that are to be tested and/or the treatment effects which are to be estimated in order to satisfy the primary objectives of the trial. The statistical methods to be used to accomplish these tasks should be described for the primary (and preferably the secondary) variables, and the underlying statistical model should be made clear. Estimates of treatment effects should be accompanied by confidence intervals, whenever possible, and the way in which these will be calculated should be identified. A description should be given of any intentions to use baseline data to improve precision or to adjust estimates for potential baseline differences, for example by means of analysis of covariance.

It is important to clarify whether one- or two-sided tests of statistical significance will be used, and in particular to justify prospectively the use of one-sided tests. If hypothesis tests are not considered appropriate, then the alternative process for arriving at statistical conclusions should be given. The issue of one-sided or two-sided approaches to inference is controversial and a diversity of views can be found in the statistical literature. The approach of setting type I errors for one-sided tests at half the conventional type I error used in two-sided tests is preferable in regulatory settings. This promotes consistency with the two-sided confidence intervals that are generally appropriate for estimating the possible size of the difference between two treatments.

The particular statistical model chosen should reflect the current state of medical and statistical knowledge about the variables to be analysed as well as the statistical design of the trial. All effects to be fitted in the analysis (for example in analysis of variance models) should be fully specified, and the manner, if any, in which this set of effects might be modified in response to preliminary results should be explained. The same considerations apply to the set of covariates fitted in an analysis of covariance. (See also Section 5.7.). In the choice of statistical methods due attention should be paid to the statistical distribution of both primary and secondary variables. When making this choice (for example between parametric and non-parametric methods) it is important to bear in mind the need to provide statistical estimates of the size of treatment effects together with confidence intervals (in addition to significance tests).

The primary analysis of the primary variable should be clearly distinguished from supporting analyses of the primary or secondary variables. Within the statistical section of the protocol or the statistical analysis plan there should also be an outline of the way in which data other than the primary and secondary variables will be summarised and reported. This should include a reference to any approaches adopted for the purpose of achieving consistency of analysis across a range of trials, for example for safety data.

Modelling approaches that incorporate information on known pharmacological parameters, the extent of protocol compliance for individual subjects or other biologically based data may provide valuable insights into actual or potential efficacy, especially with regard to estimation of treatment effects. The assumptions underlying such models should always be clearly identified, and the limitations of any conclusions should be carefully described.

5.6 Adjustment of Significance and Confidence Levels

When multiplicity is present, the usual frequentist approach to the analysis of clinical trial data may necessitate an adjustment to the type I error. Multiplicity may arise, for example, from multiple primary variables (see Section 2.2.2), multiple comparisons of treatments, repeated evaluation over time and/or interim analyses (see Section 4.5). Methods to avoid or reduce multiplicity are sometimes preferable when available, such as the identification of the key primary variable (multiple variables), the choice of a critical treatment contrast (multiple comparisons), the use of a summary measure such as ‘area under the curve’ (repeated measures). In confirmatory analyses, any aspects of multiplicity which remain after steps of this kind have been taken should be identified in the protocol; adjustment should always be considered and the details of any adjustment procedure or an explanation of why adjustment is not thought to be necessary should be set out in the analysis plan.

5.7 Subgroups, Interactions and Covariates

The primary variable(s) is often systematically related to other influences apart from treatment. For example, there may be relationships to covariates such as age and sex, or there may be differences between specific subgroups of subjects such as those treated at the different centres of a multicentre trial. In some instances an adjustment for the influence of covariates or for subgroup effects is an integral part of the planned analysis and hence should be set out in the protocol. Pre-trial deliberations should identify those covariates and factors expected to have an important influence on the primary variable(s), and should consider how to account for these in the analysis in order to improve precision and to compensate for any lack of balance between treatment groups. If one or more factors are used to stratify the design, it is appropriate to account for those factors in the analysis. When the potential value of an adjustment is in doubt, it is often advisable to nominate the unadjusted analysis as the one for primary attention, the adjusted analysis being supportive. Special attention should be paid to centre effects and to the role of baseline measurements of the primary variable. It is not advisable to adjust the main analyses for covariates measured after randomisation because they may be affected by the treatments.

The treatment effect itself may also vary with subgroup or covariate - for example, the effect may decrease with age or may be larger in a particular diagnostic category of subjects. In some cases such interactions are anticipated or are of particular prior interest (e.g. geriatrics), and hence a subgroup analysis, or a statistical model including interactions, is part of the planned confirmatory analysis. In most cases, however, subgroup or interaction analyses are exploratory and should be clearly identified as such; they should explore the uniformity of any treatment effects found overall. In general, such analyses should proceed first through the addition of interaction terms to the statistical model in question, complemented by additional exploratory analysis within relevant subgroups of subjects, or within strata defined by the covariates. When exploratory, these analyses should be interpreted cautiously; any conclusion of treatment efficacy (or lack thereof) or safety based solely on exploratory subgroup analyses are unlikely to be accepted.

5.8 Integrity of Data and Computer Software Validity

The credibility of the numerical results of the analysis depends on the quality and validity of the methods and software (both internally and externally written) used both for data management (data entry, storage, verification, correction and retrieval) and also for processing the data statistically. Data management activities should therefore be based on thorough and effective standard operating procedures. The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available.

VI. EVALUATION OF SAFETY AND TOLERABILITY

6.1 Scope of Evaluation

In all clinical trials evaluation of safety and tolerability (see Glossary) constitutes an important element. In early phases this evaluation is mostly of an exploratory nature, and is only sensitive to frank expressions of toxicity, whereas in later phases the establishment of the safety and tolerability profile of a drug can be characterised more fully in larger samples of subjects. Later phase controlled trials represent an important means of exploring in an unbiased manner any new potential adverse effects, even if such trials generally lack power in this respect.

Certain trials may be designed with the purpose of making specific claims about superiority or equivalence with regard to safety and tolerability compared to another drug or to another dose of the investigational drug. Such specific claims should be supported by relevant evidence from confirmatory trials, similar to that necessary for corresponding efficacy claims.

6.2 Choice of Variables and Data Collection

In any clinical trial the methods and measurements chosen to evaluate the safety and tolerability of a drug will depend on a number of factors, including knowledge of the adverse effects of closely related drugs, information from non-clinical and earlier clinical trials and possible consequences of the pharmacodynamic/pharmacokinetic properties of the particular drug, the mode of administration, the type of subjects to be studied, and the duration of the trial. Laboratory tests concerning clinical chemistry and haematology, vital signs, and clinical adverse events (diseases, signs and symptoms) usually form the main body of the safety and tolerability data. The occurrence of serious adverse events and treatment discontinuations due to adverse events are particularly important to register (see ICH E2A and ICH E3).

Furthermore, it is recommended that a consistent methodology be used for the data collection and evaluation throughout a clinical trial program in order to facilitate the combining of data from different trials. The use of a common adverse event dictionary is particularly important. This dictionary has a structure which gives the possibility to summarise the adverse event data on three different levels; system-organ class, preferred term or included term (see Glossary). The preferred term is the level on which adverse events usually are summarised, and preferred terms belonging to the same system-organ class could then be brought together in the descriptive presentation of data (see ICH M1).

6.3 Set of Subjects to be Evaluated and Presentation of Data

For the overall safety and tolerability assessment, the set of subjects to be summarised is usually defined as those subjects who received at least one dose of the investigational drug. Safety and tolerability variables should be collected as comprehensively as possible from these subjects, including type of adverse event, severity, onset and duration (see ICH E2B). Additional safety and tolerability evaluations may be needed in specific subpopulations, such as females, the elderly (see ICH E7), the severely ill, or those who have a common concomitant treatment. These evaluations may need to address more specific issues (see ICH E3).

All safety and tolerability variables will need attention during evaluation, and the broad approach should be indicated in the protocol. All adverse events should be reported, whether or not they are considered to be related to treatment. All available data in the study population should be accounted for in the evaluation. Definitions of measurement units and reference ranges of laboratory variables should be made with care; if different units or different reference ranges appear in the same trial (e.g. if more than one laboratory is involved), then measurements should be appropriately standardised to allow a unified evaluation. Use of a toxicity grading scale should be prespecified and justified.

The incidence of a certain adverse event is usually expressed in the form of a proportion relating number of subjects experiencing events to number of subjects at risk. However, it is not always self-evident how to assess incidence. For example, depending on the situation the number of exposed subjects or the extent of exposure (in person-years) could be considered for the denominator. Whether the purpose of the calculation is to estimate a risk or to make a comparison between treatment groups it is important that the definition is given in the protocol. This is especially important if long-term treatment is planned and a substantial proportion of treatment withdrawals or deaths are expected. For such situations survival analysis methods should be considered and cumulative adverse event rates calculated in order to avoid the risk of underestimation.

In situations when there is a substantial background noise of signs and symptoms (e.g. in psychiatric trials) one should consider ways of accounting for this in the estimation of risk for different adverse events. One such method is to make use of the 'treatment emergent' (see Glossary) concept in which adverse events are recorded only if they emerge or worsen relative to pretreatment baseline.

Other methods to reduce the effect of the background noise may also be appropriate such as ignoring adverse events of mild severity or requiring that an event should have been observed at repeated visits to qualify for inclusion in the numerator. Such methods should be explained and justified in the protocol.

6.4 Statistical Evaluation

The investigation of safety and tolerability is a multidimensional problem. Although some specific adverse effects can usually be anticipated and specifically monitored for any drug, the range of possible adverse effects is very large, and new and unforeseeable effects are always possible. Further, an adverse event experienced after a protocol violation, such as use of an excluded medication, may introduce a bias. This background underlies the statistical difficulties associated with the analytical evaluation of safety and tolerability of drugs, and means that conclusive information from confirmatory clinical trials is the exception rather than the rule.

In most trials the safety and tolerability implications are best addressed by applying descriptive statistical methods to the data, supplemented by calculation of confidence intervals wherever this aids interpretation. It is also valuable to make use of graphical presentations in which patterns of adverse events are displayed both within treatment groups and within subjects.

The calculation of p-values is sometimes useful either as an aid to evaluating a specific difference of interest, or as a 'flagging' device applied to a large number of safety and tolerability variables to highlight differences worth further attention. This is particularly useful for laboratory data, which otherwise can be difficult to summarise appropriately. It is recommended that laboratory data be subjected to both a quantitative analysis, e.g. evaluation of treatment means, and a qualitative analysis where counting of numbers above or below certain thresholds are calculated.

If hypothesis tests are used, statistical adjustments for multiplicity to quantify the type I error are appropriate, but the type II error is usually of more concern. Care should be taken when interpreting putative statistically significant findings when there is no multiplicity adjustment.

In the majority of trials investigators are seeking to establish that there are no clinically unacceptable differences in safety and tolerability compared with either a comparator drug or a placebo. As is the case for non-inferiority or equivalence evaluation of efficacy the use of confidence intervals is preferred to hypothesis testing in this situation. In this way, the considerable imprecision often arising from low frequencies of occurrence is clearly demonstrated.

6.5 Integrated Summary

The safety and tolerability properties of a drug are commonly summarised across trials continuously during an investigational product’s development and in particular at the time of a marketing application. The usefulness of this summary, however, is dependent on adequate and well-controlled individual trials with high data quality.

The overall usefulness of a drug is always a question of balance between risk and benefit and in a single trial such a perspective could also be considered, even if the assessment of risk/benefit usually is performed in the summary of the entire clinical trial program. (See section 7.2.2)

For more details on the reporting of safety and tolerability, see Chapter 12 of ICH E3.

VII. REPORTING

7.1 Evaluation and Reporting

As stated in the Introduction, the structure and content of clinical study reports is the subject of ICH E3. That ICH guidance fully covers the reporting of statistical work, appropriately integrated with clinical and other material. The current section is therefore relatively brief.

During the planning phase of a trial the principal features of the analysis should have been specified in the protocol as described in Section 5. When the conduct of the trial is over and the data are assembled and available for preliminary inspection, it is valuable to carry out the blind review of the planned analysis also described in Section 5. This pre-analysis review, blinded to treatment, should cover decisions concerning, for example, the exclusion of subjects or data from the analysis sets; possible transformations may also be checked, and outliers defined; important covariates identified in other recent research may be added to the model; the use of parametric or non-parametric methods may be reconsidered. Decisions made at this time should be described in the report, and should be distinguished from those made after the statistician has had access to the treatment codes, as blind decisions will generally introduce less potential for bias. Statisticians or other staff involved in unblinded interim analysis should not participate in the blind review or in making modifications to the statistical analysis plan. When the blinding is compromised by the possibility that treatment induced effects may be apparent in the data, special care will be needed for the blind review.

Many of the more detailed aspects of presentation and tabulation should be finalised at or about the time of the blind review so that by the time of the actual analysis full plans exist for all its aspects including subject selection, data selection and modification, data summary and tabulation, estimation and hypothesis testing. Once data validation is complete, the analysis should proceed according to the pre-defined plans; the more these plans are adhered to, the greater the credibility of the results. Particular attention should be paid to any differences between the planned analysis and the actual analysis as described in the protocol, protocol amendments or the updated statistical analysis plan based on a blind review of data. A careful explanation should be provided for deviations from the planned analysis.

All subjects who entered the trial should be accounted for in the report, whether or not they are included in the analysis. All reasons for exclusion from analysis should be documented; for any subject included in the full analysis set but not in the per protocol set, the reasons for exclusion from the latter should also be documented. Similarly, for all subjects included in an analysis set, the measurements of all important variables should be accounted for at all relevant time-points.

The effect of all losses of subjects or data, withdrawals from treatment and major protocol violations on the main analyses of the primary variable(s) should be considered carefully. Subjects lost to follow up, withdrawn from treatment, or with a severe protocol violation should be identified, and a descriptive analysis of them provided, including the reasons for their loss and its relationship to treatment and outcome.

Descriptive statistics form an indispensable part of reports. Suitable tables and/or graphical presentations should illustrate clearly the important features of the primary and secondary variables and of key prognostic and demographic variables. The results of the main analyses relating to the objectives of the trial should be the subject of particularly careful descriptive presentation. When reporting the results of significance tests, precise p-values (e.g.'p=0.034') should be reported rather than making exclusive reference to critical values.

Although the primary goal of the analysis of a clinical trial should be to answer the questions posed by its main objectives, new questions based on the observed data may well emerge during the unblinded analysis. Additional and perhaps complex statistical analysis may be the consequence. This additional work should be strictly distinguished in the report from work which was planned in the protocol.

The play of chance may lead to unforeseen imbalances between the treatment groups in terms of baseline measurements not pre-defined as covariates in the planned analysis but having some prognostic importance nevertheless. This is best dealt with by showing that an additional analysis which accounts for these imbalances reaches essentially the same conclusions as the planned analysis. If this is not the case, the effect of the imbalances on the conclusions should be discussed.

In general, sparing use should be made of unplanned analyses. Such analyses are often carried out when it is thought that the treatment effect may vary according to some other factor or factors. An attempt may then be made to identify subgroups of subjects for whom the effect is particularly beneficial. The potential dangers of over-interpretation of unplanned subgroup analyses are well known (see also Section 5.7), and should be carefully avoided. Although similar problems of interpretation arise if a treatment appears to have no benefit, or an adverse effect, in a subgroup of subjects, such possibilities should be properly assessed and should therefore be reported.

Finally statistical judgement should be brought to bear on the analysis, interpretation and presentation of the results of a clinical trial. To this end the trial statistician should be a member of the team responsible for the clinical study report, and should approve the clinical report.

7.2 Summarising the Clinical Database

An overall summary and synthesis of the evidence on safety and efficacy from all the reported clinical trials is required for a marketing application (Expert report in EU, integrated summary reports in USA, Gaiyo in Japan). This may be accompanied, when appropriate, by a statistical combination of results.

Within the summary a number of areas of specific statistical interest arise: describing the demography and clinical features of the population treated during the course of the clinical trial programme; addressing the key questions of efficacy by considering the results of the relevant (usually controlled) trials and highlighting the degree to which they reinforce or contradict each other; summarising the safety information available from the combined database of all the trials whose results contribute to the marketing application and identifying potential safety issues. During the design of a clinical programme careful attention should be paid to the uniform definition and collection of measurements which will facilitate subsequent interpretation of the series of trials, particularly if they are likely to be combined across trials. A common dictionary for recording the details of medication, medical history and adverse events should be selected and used. A common definition of the primary and secondary variables is nearly always worthwhile, and essential for meta-analysis. The manner of measuring key efficacy variables, the timing of assessments relative to randomisation/entry, the handling of protocol violators and deviators and perhaps the definition of prognostic factors, should all be kept compatible unless there are valid reasons not to do so.

Any statistical procedures used to combine data across trials should be described in detail. Attention should be paid to the possibility of bias associated with the selection of trials, to the homogeneity of their results, and to the proper modelling of the various sources of variation. The sensitivity of conclusions to the assumptions and selections made should be explored.

7.2.1 Efficacy Data

Individual clinical trials should always be large enough to satisfy their objectives. Additional valuable information may also be gained by summarising a series of clinical trials which address essentially identical key efficacy questions. The main results of such a set of trials should be presented in an identical form to permit comparison, usually in tables or graphs which focus on estimates plus confidence limits. The use of meta-analytic techniques to combine these estimates is often a useful addition, because it allows a more precise overall estimate of the size of the treatment effects to be generated, and provides a complete and concise summary of the results of the trials. Under exceptional circumstances a meta analytic approach may also be the most appropriate way, or the only way, of providing sufficient overall evidence of efficacy via an overall hypothesis test. When used for this purpose the meta-analysis should have its own prospectively written protocol.

7.2.2 Safety Data

In summarising safety data it is important to examine the safety database thoroughly for any indications of potential toxicity, and to follow up any indications by looking for an associated supportive pattern of observations. The combination of the safety data from all human exposure to the drug provides an important source of information, because its larger sample size provides the best chance of detecting the rarer adverse events and, perhaps, of estimating their approximate incidence. However, incidence data from this database are difficult to evaluate because of the lack of a comparator group, and data from comparative trials are especially valuable in overcoming this difficulty. The results from trials which use a common comparator (placebo or specific active comparator) should be combined and presented separately for each comparator providing sufficient data.

All indications of potential toxicity arising from exploration of the data should be reported. The evaluation of the reality of these potential adverse effects should take account of the issue of multiplicity arising from the numerous comparisons made. The evaluation should also make appropriate use of survival analysis methods to exploit the potential relationship of the incidence of adverse events to duration of exposure and/or follow-up. The risks associated with identified adverse effects should be appropriately quantified to allow a proper assessment of the risk/benefit relationship.

GLOSSARY

Glossary	Content
Bayesian Approaches	Approaches to data analysis that provide a posterior probability distribution for some parameter (e.g. treatment effect), derived from the observed data and a prior probability distribution for the parameter. The posterior distribution is then used as the basis for statistical inference.
Bias (Statistical & Operational)	The systematic tendency of any factors associated with the design, conduct, analysis and evaluation of the results of a clinical trial to make the estimate of a treatment effect deviate from its true value. Bias introduced through deviations in conduct is referred to as 'operational' bias. The other sources of bias listed above are referred to as 'statistical'.
Blind Review	The checking and assessment of data during the period of time between trial completion (the last observation on the last subject) and the breaking of the blind, for the purpose of finalising the planned analysis.
Content Validity	The extent to which a variable (e.g. a rating scale) measures what it is supposed to measure.
Double-Dummy	A technique for retaining the blind when administering supplies in a clinical trial, when the two treatments cannot be made identical. Supplies are prepared for Treatment A (active and indistinguishable placebo) and for Treatment B (active and indistinguishable placebo). Subjects then take two sets of treatment; either A (active) and B (placebo), or A (placebo) and B (active).
Dropout	A subject in a clinical trial who for any reason fails to continue in the trial until the last visit required of him/her by the study protocol.
Equivalence Trial	A trial with the primary objective of showing that the response to two or more treatments differs by an amount which is clinically unimportant. This is usually demonstrated by showing that the true treatment difference is likely to lie between a lower and an upper equivalence margin of clinically acceptable differences.
Frequentist Methods	Statistical methods, such as significance tests and confidence intervals, which can be interpreted in terms of the frequency of certain outcomes occurring in hypothetical repeated realisations of the same experimental situation.
Full Analysis Set	The set of subjects that is as close as possible to the ideal implied by the intention-to-treat principle. It is derived from the set of all randomised subjects by minimal and justified elimination of subjects.
Generalisability, Generalisation	The extent to which the findings of a clinical trial can be reliably extrapolated from the subjects who participated in the trial to a broader patient population and a broader range of clinical settings.
Global Assessment Variable	A single variable, usually a scale of ordered categorical ratings, which integrates objective variables and the investigator's overall impression about the state or change in state of a subject.
Independent Data Monitoring Committee (IDMC) (Data and Safety Monitoring Board, Monitoring Committee, Data Monitoring Committee)	An independent data-monitoring committee that may be established by the sponsor to assess at intervals the progress of a clinical trial, the safety data, and the critical efficacy endpoints, and to recommend to the sponsor whether to continue, modify, or stop a trial.
Intention-To-Treat Principle	The principle that asserts that the effect of a treatment policy can be best assessed by evaluating on the basis of the intention to treat a subject (i.e. the planned treatment regimen) rather than the actual treatment given. It has the consequence that subjects allocated to a treatment group should be followed up, assessed and analysed as members of that group irrespective of their compliance to the planned course of treatment.
Interaction (Qualitative & Quantitative)	The situation in which a treatment contrast (e.g. difference between investigational product and control) is dependent on another factor (e.g. centre). A quantitative interaction refers to the case where the magnitude of the contrast differs at the different levels of the factor, whereas for a qualitative interaction the direction of the contrast differs for at least one level of the factor.
Inter-Rater Reliability	The property of yielding equivalent results when used by different raters on different occasions.
Intra-Rater Reliability	The property of yielding equivalent results when used by the same rater on different occasions.
Interim Analysis	Any analysis intended to compare treatment arms with respect to efficacy or safety at any time prior to the formal completion of a trial.
Meta-Analysis	The formal evaluation of the quantitative evidence from two or more trials bearing on the same question. This most commonly involves the statistical combination of summary statistics from the various trials, but the term is sometimes also used to refer to the combination of the raw data.
Multicentre Trial	A clinical trial conducted according to a single protocol but at more than one site, and therefore, carried out by more than one investigator.
Non-Inferiority Trial	A trial with the primary objective of showing that the response to the investigational product is not clinically inferior to a comparative agent (active or placebo control).
Preferred and Included Terms	In a hierarchical medical dictionary, for example MedDRA, the included term is the lowest level of dictionary term to which the investigator description is coded. The preferred term is the level of grouping of included terms typically used in reporting frequency of occurrence. For example, the investigator text “Pain in the left arm” might be coded to the included term “Joint pain”, which is reported at the preferred term level as “Arthralgia”.
Per Protocol Set (Valid Cases, Efficacy Sample, Evaluable Subjects Sample)	The set of data generated by the subset of subjects who complied with the protocol sufficiently to ensure that these data would be likely to exhibit the effects of treatment, according to the underlying scientific model. Compliance covers such considerations as exposure to treatment, availability of measurements and absence of major protocol violations.
Safety & Tolerability	The safety of a medical product concerns the medical risk to the subject, usually assessed in a clinical trial by laboratory tests (including clinical chemistry and haematology), vital signs, clinical adverse events (diseases, signs and symptoms), and other special safety tests (e.g. ECGs, ophthalmology). The tolerability of the medical product represents the degree to which overt adverse effects can be tolerated by the subject.
Statistical Analysis Plan	A statistical analysis plan is a document that contains a more technical and detailed elaboration of the principal features of the analysis described in the protocol, and includes detailed procedures for executing the statistical analysis of the primary and secondary variables and other data.
Superiority Trial	A trial with the primary objective of showing that the response to the investigational product is superior to a comparative agent (active or placebo control).
Surrogate Variable	A variable that provides an indirect measurement of effect in situations where direct measurement of clinical effect is not feasible or practical.
Treatment Effect	An effect attributed to a treatment in a clinical trial. In most clinical trials the treatment effect of interest is a comparison (or contrast) of two or more treatments.
Treatment Emergent	An event that emerges during treatment having been absent pre-treatment, or worsens relative to the pre-treatment state.
Trial Statistician	A statistician who has a combination of education/training and experience sufficient to implement the principles in this guidance and who is responsible for the statistical aspects of the trial.

ICH E9

ICH E9 临床试验的统计学原则

English Version

ICH E9 Statistical Principles for Clinical Trials

1. 引言

1.1 背景与目的

医药产品的有效性和安全性需由临床试验来论证。所采用的临床试验需遵循ICH在1996年5月1日通过的“良好临床实践（GCP）：综合指南”（ICH E6）。 ICH E6已阐明统计学在临床试验设计和分析中不可或缺的作用。由于统计学研究在临床试验领域的不断发展，加之临床研究在药物审批流程及一般医疗保健中的重要作用，因此，有必要制订一份关于临床试验统计学问题的简明文件。本指南旨在协调在欧洲、日本和美国提交上市申请的临床试验所应用的统计学方法的原则。

作为起点，本指南使用了欧盟专利医药产品委员会（CPMP）在题为《用于申请医药产品上市许可的临床试验生物统计学方法》（1994年12月）指南的意见，并参照了日本厚生省的《临床研究中的统计分析指南》（1992年3月）和美国食品药品监督管理局的《新药申请中临床与统计部分的格式与内容指南》（1998 年 7月）。其他 ICH指南也包含一些与统计学原则和方法有关的主题，特别是下面所列的指南。本指南的各个部分会对包含相关内容的特定指南进行标注。

E1A:	人群暴露程度对评价临床安全性的影响
E2A:	临床安全性数据管理：快速报告的定义与标准
E2B:	临床安全性数据管理：个例安全报告传输数据元素
E2C:	临床安全性数据管理：上市药品的定期安全性更新报告
E3:	临床研究报告的结构与内容
E4:	支持药品注册的剂量反应信息
E5:	国外临床数据可接受性的种族因素
E6:	良好临床实践：综合指南
E7:	特殊人群的支持性研究：老年医学
E8:	临床试验的一般考虑
E10:	临床试验中对照组的选择
M1:	用于监管目的的医学术语标准化
M3:	用于实施药物人体临床试验的非临床安全性研究

本指南旨在为申办方在整体临床研发背景下，对研究产品临床试验的设计、实施、分析和评价提供指导。本指南也将会帮助科学专家准备上市申请总结报告或者评价主要来自研发后期的临床试验的有效性和安全性证据。

1.2 范围与方向

本指南的重点是统计学原则，并不涉及具体统计步骤或方法的使用。确保这些原则得到正确实施的具体程序性步骤是申办方的职责。本指南对不同临床试验之间的数据整合亦作了讨论，但并不作为重点。其他ICH指南涵盖了与数据管理及临床试验监查活动有关的原则和程序，此处不再赘述。

本指南对很多科学学科的人士都是有意义的。然而，正如ICH E6 所述，我们假定所有与临床试验有关的统计工作的实际职责由训练有素且经验丰富的统计师承担。试验统计师（见词汇表）在与其他临床试验专家合作时，其作用和职责是确保在支持药物研发的临床试验中恰当地应用统计学原则。因此，试验统计师应同时具备足够的教育/训练和经验以贯彻本指南所阐明的原则。

对于每一个用于上市申请的临床试验，有关设计、实施和拟采用的统计分析的主要特征等重要细节需在研究方案中阐明。对方案中步骤的遵循程度和主要分析预先计划的程度，都将决定试验最终结果和结论的可信度。方案及后续修订应获得包括试验统计师在内的责任人员的批准。试验统计师应恰当使用技术术语，保证方案以及任何修订都能清楚准确地涵盖所有相关的统计问题。

本指南所述的原则主要与研发后期实施的临床试验有关，其中很多是有效性的确证性试验。除有效性外，确证性试验也可把安全性指标（如不良事件、临床实验室指标或心电图测量）、药效学或药代动力学指标（如确证性的生物等效性试验）作为主要指标。其次，有些确证性结果可能来源于不同试验的整合数据，本指南有些原则适用于这种情况。最后，虽然药物研发早期本质上以探索性临床试验为主，但统计学原则也与这些临床试验有关。因此，本指南应尽可能地应用于临床研发的各个阶段。

本指南所描述的很多原则致力于最小化偏倚（见词汇表）和最大化精度。这里的术语“偏倚”是指与临床试验设计、实施、分析和结果解释有关的任何因素所导致的处理效应（见词汇表）的估计值与真实值偏离的系统性趋势。应尽可能地识别偏倚的潜在来源，以便采取措施限制这些偏倚。偏倚的存在可能严重削弱从临床试验中得出正确结论的能力。

有些偏倚源于试验设计，例如，在处理分配过程中将风险较低的受试者系统地分配到其中一个处理组。其他偏倚源于临床试验的实施和分析。例如，违背方案且基于对受试者结局的认识从分析中排除受试者是偏倚的可能来源，这可能影响处理效应的准确估计。偏倚常在不知不觉中发生，且难以直接测量，因而评价试验结果和主要结论的稳健性是重要的。稳健性是一个概念，是指整体结论对数据的各种限制、假设和数据分析方法的敏感性。稳健性意味着，当基于另一假设或分析方法进行分析时，试验的处理效应和主要结论不会受到实质性的影响。在对处理效应和处理间比较的不确定性的统计测量进行解释时，应考虑偏倚对P值、置信区间或推断的潜在影响。

由于临床试验设计和分析的主要方法基于频率派统计方法，因此在讨论假设检验和/或置信区间时，本指南主要使用频率派方法（见词汇表）。这并不意味着其它方法不可取，如果理由充分且所得结论足够稳健，则贝叶斯方法（见词汇表）及其他方法亦可考虑。

2. 总体临床研发的考虑

2.1 试验背景

2.1.1 研发计划

新药临床研发过程的广义目标是发现药物是否在某一剂量范围和用法上能够显示出既安全又有效，且其风险获益关系能够被接受。可能从药物获益的特定对象以及特定的适应症也需要被定义。

满足这些目标通常需要一系列循序渐进的临床试验，每一个临床试验有其特定目的（见ICH E8），应该在一个或一系列临床计划中明确，这些计划应具有适当的决策点和随知识累积而进行修订的灵活性。上市申请应清晰地描述这些计划的主要内容和每个试验的作用。对整个试验项目证据的解释和评价需要综合单个试验的证据（见第7.2章节），为此应确保试验在一些特征上采用通用标准，如医学术语词典、主要测量的定义与时点、方案违背的处理，等等。当医学问题通过一个以上的试验来回答时，统计汇总、综述或meta分析（见词汇表）可能会有用。应尽量在计划中考虑到这一点，以便清晰地确定相关的试验，并且预先指定必要的设计方面的共同特征。应该在该计划中阐述可能会涉及整体计划中若干试验的其他主要统计学问题（如果有的话）。

2.1.2 确证性试验

确证性试验是一种预先提出假设并进行评价的具有充分对照的试验。原则上确证性试验需要提供有效性或安全性的确凿证据。此类试验中，感兴趣的关键假设通常需预先定义，应能直接反映试验的主要目的，且在试验完成后得到检验。在确证性试验中，以适当的精度估计处理效应的大小，与把这些效应和临床意义联系起来同等重要。

确证性试验旨在提供确凿证据以支持主张，因此，按照方案及标准操作规程进行试验尤为重要。应该解释和书面记录不可避免的变化，并考察它们的影响。此类试验设计的合理性以及其它重要的统计方面，如计划分析的主要特征，均应写入方案。每个试验应仅解决有限的问题。

支持所主张的确凿证据要求确证性试验的结果证实研究产品具有临床获益。因此确证性试验应清晰明确地回答每一个与有效性或安全性主张有关的关键临床问题。另外，推论（见词汇表）到目标患者人群的基础得以理解和解释很重要，这也会影响到所需研究中心和/或试验的数量和参与人员（如专家或全科医师）。确证性试验的结果应当是稳健的。某些情况下，单一确证性试验所提供证据强度可能就足够了。

2.1.3 探索性试验

确证性试验的理论基础和设计几乎总是依赖于一系列早期探索性临床研究工作。这些探索性研究和所有临床试验一样应有清晰和明确的目的，但与确证性试验相比，它们的目的并不总是对预先定义的假设进行简单检验。此外，探索性试验可能有时需要采用更灵活的方法进行设计，以便根据积累的结果更改设计。它们的分析可能仅限于数据探索，也可能进行假设检验，但假设的拟定可能依赖于数据。尽管这类试验可能对整体的相关证据有贡献，但不能作为证明有效性的正式依据。

任何试验可能同时具有确证性和探索性两个方面。例如，在大多数确证性试验中，也会对数据进行探索性分析，作为解释和支持研究发现、为后期研究提出进一步假设的基础。方案应明确区分进行确证试验和对数据做探索性分析的两种不同情况。

2.2 试验范围

2.2.1 人群

在药物研发的早期阶段，临床试验受试者的选择在很大程度上受到主观愿望的影响，即希望最大可能地观察到感兴趣的特定临床疗效，因此，研究对象往往是药物最终适用的患者总体中一个非常局限的亚组。但在开展确证性试验的时候，试验受试者应更能反映目标人群。因此，在保持足够的同质性以精确估计处理效应的同时，尽可能放宽目标人群的纳入和排除标准，这对确证性试验是有益的。由于地理位置、实施时间、特定研究者和诊所的医疗实践等因素的影响，任何一个临床试验都不可能完全代表将来的用药者。尽管如此，应尽可能减少这些因素的影响，并在解释试验结果时充分讨论。

2.2.2 主要和次要指标

主要指标（又称“目标”指标，主要终点）应能够提供与试验主要目的直接相关的最具临床相关性和说服力的证据。通常应只设置一个主要指标。因大部分确证性试验的主要目的是提供与有效性相关的强有力的科学证据，所以主要指标通常是有效性指标。安全性/耐受性有时也可能是主要指标，且会一直是一种重要的考量。有关生活质量和卫生经济的指标是进一步的潜在主要指标。主要指标的选择应反映相关研究领域公认的准则和标准。建议使用在早期研究或发表文献中获得的具有实践经验的可靠且已验证的指标。在纳入和排除标准所描述的患者人群中，应该有充分的证据说明主要指标能够有效和可靠地度量临床相关的和重要的治疗获益。主要指标通常用于样本量估计（见第3.5章节）。/p>

很多情况下，评价受试者结局的方法可能并不直接，应仔细定义。例如，将死亡率作为主要指标而无进一步说明是不够的，因为对死亡率的评价可以是比较某些固定时点的存活比例，也可以是比较在特定时域内生存时间的总体分布。另一个常见的例子是复发事件，处理效应的测量可以是简单的二分类指标（特定时期内的任何复发）、首次复发的时间、复发率（观察的单位时间的事件数），等等。在评价慢性病的处理效应时，随时间变化的功能状态对选择主要指标提出了其他挑战。相应的方法有多种，例如，观察期开始和结束时所做评价的比较、由观察期所有评价求得的斜率的比较、超过或低于规定阈值的受试者比例的比较、基于重复测量数据方法的比较。为避免因事后定义所产生的多重性担忧，在方案中规定主要指标的精确定义至关重要，因为该定义将用于统计分析。另外，所选择的具体主要指标的临床相关性和相关测量过程的合理性通常需要在方案中阐明。

主要指标及其选择理由应在方案中详细说明。揭盲后重新定义主要指标通常是不可接受的，因为由此引入的偏倚很难评价。当根据主要目的确定的临床效应存在多种测量方法时，应根据临床相关性、重要性、客观性、和/或其它相关特性，在方案中选择其中一种切实可行的测量方法作为主要指标。

次要指标是与主要目的相关的支持性指标，或与次要目的相关的效应指标。在方案中预先定义次要指标，并说明它们的相对重要性以及在解释试验结果时的作用也很重要。次要指标的数量应有限制，且与试验要回答的有限问题相关。

2.2.3 复合指标

当与主要目的相关的多种测量方法中难以确定单一的主要指标时，另一种有用的策略是按预先确定的计算方法将多个指标组合成一个单一或“复合”指标。主要指标有时以多种临床测量方法相组合的形式出现（如关节炎、精神疾病和其它疾病使用的量表），这虽涉及多重性问题，但无需调整I类错误。将多个指标组合的方法应在方案中详细说明，且应以临床获益的大小对结果进行解释。当复合指标被用作主要指标时，可以对复合指标中有临床意义的单个指标进行单独分析。当量表被用作主要指标时，阐明内容效度（见词汇表）、评价者内和评价者间信度（见词汇表）及检测疾病严重程度变化的反应度等尤其重要。

2.2.4 全局评价指标

在某些情况下，全局评价指标（见词汇表）用于评价某个处理的整体安全性、有效性和/或实用性。这种指标类型整合了客观指标和研究者对受试者的状态或状态变化的总体印象，它通常是一个有序分类量表。整体有效性的全局评价方法已经用于某些治疗领域，如神经病学和精神病学。

全局评价指标一般带有主观成分。使用全局评价指标作为主要或次要指标时，应该在方案中对量表的以下方面进行详细说明：

1) 量表与试验主要目的的相关性；

2) 量表的效度和信度基础；

3) 如何根据所收集的数据将个体受试者归类于量表中的特定类别；

4) 如何将有缺失数据的受试者归类于量表中的特定类别，或用其他方法评价。

若研究者选取的全局评价指标中包含客观指标，则这些客观指标应作为附加的主要指标，或至少作为重要的次要指标。

全局实用性评价综合了获益与风险两方面因素，反映了经治医生的决策过程，即医生在做出使用产品的决策时，必须权衡获益与风险。全局实用性指标会产生这样的问题，即某些情况下会将获益和不良反应方面差别很大的两种产品判断为等效。例如，将一种治疗的全局实用性指标判断为等效于或优效于另一种治疗时，可能掩盖了其疗效甚微或无效但不良反应较少的事实。因此不建议将全局实用性指标作为主要指标。如果全局实用性指标被用作主要指标，则将特定的有效性和安全性结局分别作为附加的主要指标考虑是非常重要的。

2.2.5 多个主要指标

有时需要使用一个以上的主要指标，且每一个指标（或其中一个子集）都足以涵盖其治疗效果的范围。解释这类证据的既定方式应当详细说明，即应该说明对任一指标，或最少几个指标，或全部指标的影响是否被认为是达到试验目的所必需的。应该针对已定义的主要指标清楚地说明主要假设或相关的假设与参数（如均数、百分数、分布），并清楚地叙述统计推断方法。因为存在潜在的多重性问题，所以应解释对I类错误的影响（见第5.6章节），也应在方案中给出控制I 类错误的方法。在评价对I类错误的影响时，所提出的主要指标之间的相关程度也需要考虑。如果试验目的是证实所有主要指标的效果，则无需调整I类错误，但必须仔细考虑对 II 类错误和样本量的影响。

2.2.6 替代指标

当通过观察实际临床有效性直接评价受试者的临床获益不可行时，可以考虑间接标准（替代指标—见词汇表）。一些被认为可以预测临床获益的指标通常可作为替代指标。确定替代指标有两个主要关注点：第一，它可能不是相关临床结局的真正预测因子，例如，它可以测量与一个特定药理学机制有关的治疗活性，但不能提供治疗的作用范围与最终效果的全部信息，无论是阳性还是阴性。许多例证表明，治疗在替代指标显示出高度阳性效应，而最终被证明对受试者的临床结局是有害的。与此相反，也有一些例证显示，治疗的临床获益明确却未能在替代指标体现。第二，替代指标可能不会定量测量可直接权衡不良反应的临床获益。验证替代指标的统计学标准已经具备，但是使用它们的经验相对有限。在实践中，替代证据的强度取决于（1）替代关系的生物学合理性；（2）流行病学研究证明替代指标对临床结局的预后价值；（3）临床试验证明替代指标的处理效应相当于临床结局的效应。一种产品的临床指标和替代指标之间的关系并不一定适用于治疗同一种疾病但具有不同作用方式的另一种产品。

2.2.7 分类指标

连续型或等级指标有时可能需要转化为二分类或其他分类指标。“成功”和“应答”的标准是二分类的常见例子。分类标准需明确规定，例如，连续型指标最小百分比的改善（相对于基线），或者有序等级量表中等于或高于某个阈值水平（如“良”）的按顺序分类。

舒张压降低于90mmHg是一个常见的二分类例子。当分类有明确的临床相关性时，它们是最有用的。众所周知，选择分类标准很容易使临床结果产生偏倚，因此在方案中应预先定义和特别说明分类标准。由于分类通常意味着信息丢失，因此在分析中会损失检验效能，样本量计算时需加以考虑。

2.3 避免偏倚的设计技术

临床试验中，避免偏倚的最重要的设计技术是盲法和随机化，它们为上市申请中大多数对照临床试验所常规采用。大多数此类试验采用双盲法，按照合适的随机化方案，对治疗药物进行预先包装并提供给试验中心，只标明受试者编号和疗程，从而使参与试验的任何人都不知道分配给任何特定受试者的具体治疗药物，甚至不知道编码字母。该方法会在第2.3.1 章节和第2.3.2章节中的大部分内容中进行介绍，例外情况会在最后考虑。

设计阶段应在方案中制定针对性措施，以使试验实施过程中可能损害分析的不规范操作最小化，从而减少偏倚。这里指的不规范操作包括各种类型的方案违背、退出和数据缺失。方案中应考虑一些方法，以减少出现这些问题的频率，以及解决在数据分析中出现的问题。

2.3.1 盲法

盲法或遮蔽是为了限制临床试验的实施和解释时所产生的有意或无意的偏倚，这些偏倚可能源于以下情况的影响：知晓受试者的招募和处理分组、受试者的后续治疗、受试者对治疗的态度、终点评价、退出的处理、从分析中剔除数据，等等。盲法的根本目标是防止知晓处理分组，直到所有产生偏倚的机会都消失。

在双盲试验中，所有受试者及参与受试者的治疗或临床评价的研究者和申办方人员，包括确定受试者资格、评价终点或评价方案依从性的任何人，均不知道受试者所接受的治疗。在整个试验实施过程中，这种盲态要始终保持，只有当数据被清理到可接受的质量水平时，才可对适当的人员揭盲。如果需要对不参与受试者的治疗或临床评价的申办方人员揭盲处理编码（如生物分析学家、稽查员、参与严重不良事件报告的人员），申办方应该制定严格的标准操作规程，以防止处理编码的不当传播。在单盲试验中，研究者和/或他的成员知道处理分组信息，但受试者不知道，反之亦然。在开放试验中，所有的人都可能知道处理分组信息。双盲试验是最优方法，它要求试验所采用的处理在使用前或使用期间均无法被识别出来（如外观、味道等），且在整个试验期间均适当地保持盲态。

达到理想的双盲会有很多困难：有些处理可能具有完全不同的性质，例如，手术和药物治疗；两种药物可能具有不同的剂型，虽然使用胶囊可以令它们无法被区分，但改变剂型可能会改变药代动力学和/或药效学的特性，因此需要建立制剂的生物等效性；两种处理的每日用法可能不同。这些情况下，使用“双模拟”（见词汇表）技术是实现双盲条件的一种方法，该技术有时会强制实施一种非同寻常的使用方案，使得受试者的积极性和依从性受到负面影响。伦理上的困难也可能会干扰该技术的应用，例如手术过程的模拟。无论如何，应当努力克服这些困难。

某些临床试验的双盲性质可能由于明显的处理诱导效应而遭到部分破坏。这种情况下，使研究者和有关申办方人员对某些检验结果（如所选择的临床实验室测量）保持盲态，可以使盲法得到改善。使偏倚最小化的类似方法（见下文）应当在开放试验中考虑，例如独特的处理效应无法对患者设盲的试验。

如果双盲试验不可行，则应考虑用单盲方案。有些情况下，只有开放试验在实践上或伦理上是可行的。单盲和开放试验更具灵活性，但特别重要的是，研究者知道了下一个受试者的处理不应影响入组受试者的决定，即该决定应在知道随机化处理之前做出。对于这些试验，应考虑使用中央随机化方法，如采用电话随机化管理处理的分配。此外，应该由不参与治疗受试者并对处理保持盲态的医务人员进行临床评价。在单盲或开放试验中，应尽一切努力使各种已知的偏倚来源降到最低，并且应采用尽可能客观的主要指标。应在方案中解释所采用的盲态程度的原因，以及所采取的使偏倚最小化的措施。例如，申办方应当有严格的标准操作规程，以保证在清理数据库以供分析之前，适当限制对处理编码的获取。

只有经治医师认为对某一受试者的治疗有必要知道其处理分配时，才应考虑对该受试者破盲。无论什么原因导致的任何有意或无意地破盲都应该在试验结束时给予报告和解释。处理分配的揭盲过程及时间都应该记录在案。

本文件中，数据的盲态审核（见词汇表）是指在试验完成（对最后一位受试者的最后一次观察）到揭盲之间的这段时间内对数据的检查。

2.3.2 随机化

在临床试验中，随机化将机会元素引入到受试者的处理分配中。在试验数据的后续分析期间，它为定量评价与处理效应有关的证据提供了坚实的统计基础。它倾向于使各处理组的已知和未知的预后因素分布相似。与盲法结合，在受试者的选择和分配时，随机化有助于避免因处理分配的可预测性而可能出现的偏倚。

临床试验的随机化列表记录了施与受试者处理的随机分配，其最简单的方式是处理的序列表（或交叉试验中的处理序列），或按受试者编号对应的编码。有些试验，如具有筛选阶段的试验，可能使问题复杂一些，但是预先计划的受试者的处理分配或处理序列应是唯一的。不同的试验设计需要不同的程序来生成随机化列表。随机化列表应当有重现性（如果需要）。

虽然无限制条件的随机化是一种可接受的方法，但区组随机一般具有某些优势，它有助于增加处理组间的可比性，特别是当受试者特征可能随时间变化时，例如由于招募策略改变引起的变化。它还能更好地保证各处理组的样本量几乎相等。在交叉试验中，它提供了获得具有更高效率和更易于解释的平衡设计的方法。选择区组长度时需注意，既要足够短以限制可能的不平衡，又要足够长以避免对区组序列末尾的可预测性。区组长度通常应对研究者及其他有关人员保持盲态；使用两种或多种区组长度与每个区组随机选择长度，可达到同样目的。（理论上，在双盲试验中，可预测性并不重要，但药物的药理作用可能提供猜测机会。）

对于多中心试验（见词汇表），应按中心进行随机化。提倡每个中心有一个单独的随机方案，即按中心分层或为每个中心分配若干完整的区组。更一般地，按照基线测量的重要预后因素（如疾病的严重程度、年龄、性别等）进行分层，可保障层内的平衡分配，这种方法在小型试验中潜在益处更大。分层因素一般不超过三个，否则实现平衡不仅困难，而且麻烦。应用动态分配程序（见下文）可能有助于同时在多个分层因素之间达到平衡，只要可以调整其余试验流程以适应这类方法。应当在后续的分析中对分层随机化的因素加以考虑。

进入试验的下一个随机化受试者，应该接受对应于随机化列表（如果随机化是分层的，则在相应的层中）中下一个号码的处理。只有当已经确认下一个受试者进入到试验的随机化阶段时，才能给受试者分配合适的号码和相关处理。具有增加可预测性的随机化细节，如区组长度，不应包含在试验方案中。随机化列表本身应该由申办方或独立方安全存档，以确保整个试验过程维持盲态。在试验期间获取随机化列表应该考虑在紧急情况下为任何受试者破盲的可能性。破盲应遵循的程序、必要的文件以及受试者后续的处理和评价均应在方案中写明。

动态分配也是一种选择，该方法根据当前已分配的处理的平衡情况进行处理分配，对于分层试验，处理分配视受试者所属层内的平衡情况而定。应当避免确定性的动态分配程序，应当为每个处理分配纳入适当的随机化要素。应尽一切努力保持试验的双盲状态。例如，仅限于中央试验办公室知道处理编码，并由办公室通过电话联系来控制动态分配。这种方法允许对入选标准进行额外检查，并会建立试验入组的记录，这些信息对某些类型的多中心试验具有价值。随后会启用双盲试验的预包装和贴标签的药品供应系统，但它们的使用顺序不再是依次的。最好使用适当的计算机算法使中央试验办公室的人员对处理编码保持盲态。当考虑动态分配时，应该仔细评价物流的复杂性以及对分析的潜在影响。

3. 试验设计的考虑

3.1 设计类型

3.1.1 平行组设计

对于确证性试验，最常见的临床试验设计是平行组设计，该设计将受试者随机分配到两组或多组中的一组，每组采用不同的处理。这些处理包括一个或多个剂量的研究产品，以及一个或多个对照处理，如安慰剂或/和阳性对照。该设计的假设比大多数其它设计简单，但与其它设计一样，可能会有使分析和解释复杂化的额外试验特征，如协变量、随时间的重复测量、设计因素之间的交互作用、方案违背、脱落（见词汇表）、退出等。

3.1.2 交叉设计

在交叉设计中，每个受试者被随机分到两个或多个处理序列，因此处理间的比较相当于自身对照。这种简单策略之所以有吸引力，主要因为它减少了满足检验效能所需的受试者，有时减少的程度相当可观。2×2 交叉设计是最简单的，该设计通常在先后两个处理周期中安排一个洗脱期，每个受试者以随机顺序在每个处理周期接受两个处理中的其中一个。最常见的扩展设计是n个周期和n（>2）个处理，每个受试者先后接受所有 n 个处理。此类设计形式多样，例如，每个受试者接受n（>2）个处理中的一个子集，或者对一个受试者重复给予处理。

交叉设计有很多问题可导致其结果无效，主要困难在于残留效应，即在后继处理周期内的前序处理的残余影响。使用相加模型时，不同的残留效应将使处理间的直接比较产生偏倚。对于2×2设计，统计上无法将残留效应从处理与周期的交互作用中区分开来，并且因为相应的对比是“受试者之间”，故检验这两个效应中任何一个都缺乏检验效能。这一问题在高阶设计中并不严重，但不能完全消除。

因此，使用交叉设计重要的是要避免残留效应，最好的办法是在充分了解疾病领域和新药的基础上有选择地和谨慎地使用该设计，诸如针对病情稳定的慢性病；治疗周期内可充分发挥药物的相关效应；洗脱期足够长以使药物效应完全消退等。应该在试验前利用已有信息及数据确定是否可满足这些条件。

交叉试验还有一些需要密切注意的问题，其中，受试者失访导致的分析和解释的复杂化最值得关注。另外，残留效应的潜在作用导致后续处理周期所发生的不良事件很难判断是哪种处理所致。这些问题以及其它问题在ICH E4中已有阐述。交叉设计一般应严格限于预期仅有少数失访的试验。

采用2×2交叉设计验证相同药物的两种制剂的生物等效性甚为常用，往往令人满意，尤其是以健康志愿者为对象的试验，如果两个周期间的洗脱时间足够长，极不可能发生相关药代动力学指标的残留效应。不过，在分析期间基于获得的数据核实这一假设仍然非常重要，例如，通过在每个周期开始时未检测到药物来证实无残留效应。

3.1.3 析因设计

在析因设计中，通过使用不同的处理组合可以同时评价两个或多个处理。最简单的例子是2×2析因设计，受试者被随机分配到两个处理 A和B的四种可能组合之一，即单独A、单独B、既有A又有B、既无A又无B。该设计多以检验A和B的交互作用为特定目的。如果基于检验主效应计算样本量，则交互作用统计检验的检验效能可能不足。当该设计被用于检验A和B的联合效应时，特别是如果两者可能被一起使用，这一考虑尤为重要。

析因设计的另一个重要用途是，建立同时使用处理C和D时的剂量-反应特征，特别是在先前试验中每种单一疗法的某个剂量的有效性已被证实的情况。设C的剂量数为m（通常包括零剂量，即安慰剂），相似的D的剂量数为n，整个设计由m×n 个处理组构成，每个处理组为一种不同的C和D的剂量组合，则应用响应面的结果估计可以帮助确定临床使用的C和D剂量的恰当组合（见ICH E4）。

某些情况下，如评价两种处理的有效性所需的受试者数量与单独评价任一种处理的有效性所需的受试者数量相同时，2×2 设计可能会更高效地利用受试者，这一策略已经被证实对非常大型的死亡率试验颇有价值。该方法的效率和可靠性取决于处理A和B之间不存在交互作用，使得A和B对主要有效性指标的主效应服从相加模型，因此，无论是否追加B的效应，A的效应是确定的。对于交叉试验，应在试验前利用先前的信息和数据，这很可能会找到满足无交互作用的证据。

3.2 多中心试验

开展多中心试验主要有两个原因。首先，多中心试验是一种更加高效地评价新药的可接受的方法；某些情况下，为在合理的时间框架内获得足够的受试者以满足试验目的，它可能是唯一可行的方法。原则上，在临床研发的任何阶段均可开展这种性质的多中心试验。多中心试验可能有几个中心，每个中心的受试者数量较大；也可能有很多中心，每个中心只有很少的受试者，比如罕见病研究。

其次，设计成多中心（和多个研究者）试验主要是为研究结果的后续推论提供更好的基础，因为从更广泛的人群中招募受试者和呈现更宽泛的使用药物的临床环境，从而呈现出更典型的未来用药场景。这种情况下，许多研究者的参与也可提供更宽泛的药物价值临床判断。此类试验在药物研发后期将成为确证性试验，可能有大量的研究者和中心参与。为增强可推论性（见词汇表），多中心试验有时会在许多不同国家实施。

要想充分解释和外推多中心试验结论，所有中心实施研究方案的方式应该是明确的和相似的。样本量和检验效能的计算通常基于各中心的处理间差异是相同的无偏估计的假设，因此，制定共同研究方案并给予实施很重要。试验的实施流程应该尽可能标准化。通过研究者会议、试验前的人员培训和试验期间的严密监查，可以减少评价标准和方法的不一致性。良好设计的目的通常是实现每个中心内各处理组的受试者分布相同，而良好管理可以对该目的起到支持作用。应避免中心间的病例数相差太大以及个别中心病例数太少，这一考虑的好处会在后期探查中心间处理效应的异质性时显示出来，因为这样可以减少处理效应不同加权估计之间的差异。（这一点并不适用于所有中心病例数都非常少的试验，以及分析时不考虑中心效应。）如果不采取这些预防措施，加之对结果同质性的质疑，会使多中心试验的价值降低，有时甚至严重到不能为申办方的主张提供令人信服的证据的地步。

最简单的多中心试验是每位研究者负责在一家医院招募受试者，所以，“中心”是由研究者或医院唯一确定的。可是，很多试验会更复杂一些，例如，一个研究者可能从几家医院招募受试者；一个研究者可能代表一个临床医生团队（参与研究者），他们或从一家医院所辖的几个诊所，或从几家相关的医院招募受试者。只要对统计模型中关于中心的定义有疑义，方案中的统计章节（见第5.1章节）就应在特定试验背景下明确定义该术语（例如，按研究者、场所或地区）。多数情况下，根据研究者定义中心较为可行，ICH E6在这方面提供了相关指南。定义中心的目的是使影响主要指标测量的因素和处理的影响达到同质，以免因此引起质疑。任何将中心合并起来进行分析的规则应尽可能在方案中合理阐述并预先规定，但是，任何基于此方法的决策都应始终在盲态下做出，如盲态审核。

方案中应该描述处理效应的估计和检验的统计模型。主要处理效应估计可首先使用包含中心效应的模型，但不包含处理与中心的交互项。如果处理效应中心间是同质的，则在模型中常规地包含交互项会降低对主要效应的检验效率；如果确实存在处理效应的异质性，则对处理效应的解释是有争议的。

某些试验，如大型的死亡率试验，每个中心只有很少受试者，设想中心对主要或次要指标有任何影响都是缺乏依据的，因为中心因素的影响不可能代表临床重要性。还有一些试验可能从一开始就会认识到每个中心有限的受试者使得统计模型中包含中心效应变得不切实际。这种情况下，模型中不应包含中心项，而且也没有必要按中心进行分层随机化。

对于每个中心都有充足的受试者的试验，如果发现阳性处理效应，通常应探索不同中心间处理效应的异质性，因为这可能影响结论的外推性。通过各中心结果的图示方法，或通过对中心与处理间交互作用的统计检验，可能会发现明显的异质性。对交互效应做统计检验时，需认识到其检验效能不高，因为试验是基于探测处理的主效应而设计的。

如果发现处理效应的异质性，则应当谨慎地加以解释，并应积极尝试从试验管理的其他特征或受试者特征方面来寻找原因。这样的原因通常会提示适当的进一步分析和解释。在缺乏原因的情况下，一旦证实处理效应的异质性，例如，通过明显的定量交互作用（见词汇表），意味着处理效应可能需要另一种估计，比如给中心不同赋权以保障处理效应估计的稳健性。理解定性交互作用（见词汇表）的异质性甚至更为重要，当未能找到原因时，要想可靠地预测处理效应，可能需要进一步开展临床试验。

以上针对多中心试验的讨论都是基于采用固定效应模型的。混合模型也可用于探索处理效应的异质性，它把中心效应和中心与处理间的交互效应看作是随机的，尤其适合于中心数量特别多的情况。

3.3 比较的类型

3.3.1 优效性试验

科学地讲，通过安慰剂对照试验显示优于安慰剂，或通过显示优于阳性对照处理，或显示剂量-反应关系，所得到的疗效是最可信的。此类试验被称为“优效性”试验（见词汇表）。本指南一般以优效性试验为假定，除非另有明确说明。

对于严重疾病，如果存在经优效性试验验证的有效的治疗方法，采用安慰剂对照试验可能被认为是有悖伦理的。这种情况下，应当科学地采用阳性对照。安慰剂对照和阳性对照的适用性应当不同试验给予不同考虑。

3.3.2 等效性或非劣效性的试验

某些情况下，研究产品与参照处理相比的目的并非为了显示优效性。此类试验根据其目的分为两大类，一类是“等效性”试验（见词汇表），另一类是“非劣效性”试验（见词汇表）。

生物等效性试验属于前一类。某些情况下，出于其他监管原因也进行临床等效性试验，例如，当化合物不被吸收并因此不存在于血液中时，验证仿制产品与已上市产品的临床等效性。

很多阳性对照试验用于验证研究产品的有效性非劣效于阳性对照药，因此属于后一类。另一种可能是在试验中将研究药品的多个剂量与标准药品的推荐剂量或多个剂量进行比较。这种设计的目的是同时显示研究产品的剂量-反应关系，并将研究产品与阳性对照进行比较。

阳性对照等效性或非劣效性试验也可引入安慰剂对照，从而在一个试验中设定多个目标，例如，这种设计在验证优效于安慰剂的同时，还可以评价相对于阳性对照的有效性与安全性的相似程度。众所周知，采用不包含安慰剂或不设置新药多个剂量的阳性对照等效性（或非劣效性）试验会面临一些困难。与优效性试验相比，此类试验隐性缺乏内部效度，因此必须进行外部验证。等效性（或非劣效性）试验本质上并不保守，因此，在试验设计或实施中的许多缺陷倾向于使结果倾向等效的结论。由于这些原因，这些试验的设计特点应受到特别关注，它们的实施需要特别小心，例如，尽量减少违反入选标准、不依从、退出、失访、数据缺失和其它偏离方案的发生率，并使它们对后续分析的影响降至最低。

应谨慎选择阳性对照。恰当的阳性对照应该是一种被广泛使用的疗法，其针对相关适应症的疗效已在良好设计和良好记录的优效性试验中得到了量化确认，并且能够可靠地预期在将要实施的试验中显示出相似的疗效。为此，新试验应该与以前实施且明确显示出临床相关疗效的优效性试验具有相同的重要设计特征（主要指标、阳性对照的剂量、入排标准等），且考虑与新试验相关的医学或统计学实践的进展。

在试验方案中，一个关键问题是要把证明等效性或非劣效性的意图清晰明确地表述出来。方案中应规定一个等效界值，该界值被视为临床可接受的最大差异，并且应当小于在阳性对照优效性试验中所观察到的差异。对于阳性对照等效性试验，需规定等效界值的上限和下限；而对于阳性对照非劣效性试验，仅需规定界值下限。等效界值的选择应具备临床的合理性。

统计分析通常采用置信区间方法（见第5.5章节）。对于等效性试验，应当使用双侧置信区间。如果置信区间完全落在等效界值之内，可推断为等效。在实操上，该法相当于双单侧检验方法，其（复合）无效假设是处理间差异在等效界值之外，（复合）备择假设是处理间差异在等效界值之内。由于两个无效假设无重叠，故I类错误可控。对于单侧假设检验，其无效假设是处理间差异（试验品减去对照品）等于或小于等效界值的下限，而备择假设是处理间差异大于等效界值下限。单侧或双侧检验的I类错误选择有所不同。样本量计算应当基于这些方法（见第3.5章节）。

在研究产品与阳性对照之间无差异的无效假设下，如果基于观察到无显著差异的检验结果，做出等效性或非劣效性的结论是不合适的。

在选择分析数据集时也存在一些特殊问题。处理组或对照组退出或脱落的受试者都倾向于缺乏应答，因此使用全分析集（见词汇表）的结果证实等效性可能存在偏倚（见第5.2.3章节）。

3.3.3 剂量-反应关系的试验

新研究产品的剂量与应答如何相关，是一个在研发的所有阶段通过各种方法都可获得答案的问题（见ICH E4）。剂量反应试验可服务于许多目的，相对重要的有：有效性的确证；剂量反应曲线的形状和位置的研究；适宜初始剂量的估计；个体剂量调整的最优策略确定；最大剂量的确定（超出该剂量不可能额外获益）。达到上述目的需要收集研究中各种剂量的数据，包括安慰剂（零剂量）。为此，需用到估计剂量反应关系的方法，包括统计检验以及同样重要的置信区间构建和图示方法。假设检验可能需要根据剂量的自然顺序或关于剂量-反应曲线的形状（如单调性）的特定问题做出调整。应当在方案中提供详细的统计分析计划。

3.4 成组序贯设计

采用成组序贯设计便于进行期中分析（见第4.5章节和词汇表）。成组序贯设计虽然不是用于期中分析的唯一可接受的设计类型，却是最常用的，因为在试验期间以周期性间隔评价不同分组的受试者的结局比在获得整个试验每一个受试者数据后进行评价更为可行。在获得处理结局和受试者的处理分配（如揭盲，见第4.5章节）的信息之前，应充分说明统计方法。独立数据监查委员会（见词汇表）可对来源于成组序贯设计的数据实施审查或进行期中分析（见第 4.6章节）。该设计不仅已被最广泛地、成功地应用于大型、长周期的以死亡率或主要非致死性结局为终点的试验，它在其它方面的应用也在增加。尤其是，人们已经认识到所有试验中都必须监查安全性，因此，为了出于安全原因提早终止试验而制定正式流程的必要性往往是需要考虑的。

3.5 样本量

临床试验的受试者例数应足够大，以对所提出的问题提供可靠答案。样本量通常由试验的主要目的确定，如果由其它要素确定，则应明确说明理由。例如，基于安全性问题或需要或者基于重要的次要目的确定的样本量可能比基于主要有效性问题确定的样本量需要更多的受试者（例如，见ICH E1a）。

一般的样本量确定方法应考虑以下要素：主要指标、检验统计量、无效假设、所选剂量下的备择（“工作”）假设（所选受试者人群中在所选剂量下检测出或拒绝的处理间差异）、错误拒绝无效假设的概率（I类错误）、错误地不拒绝无效假设的概率（II类错误），以及应对退出和违背方案的处理方法。某些情况下，以事件率为评价检验效能的主要手段，此时需要做出一些假设，以从所需的事件数推算出试验的最终样本量。

应在方案中给出计算样本量的方法，以及在计算中使用的任何估计量（如方差、均值、反应率、事件率、待检测的差异）。也应该给出这些估计的依据。研究这些假设的偏离对样本量估计的敏感性很重要，而根据偏离假设的合理范围给出对应的样本量范围则是一种方便可行的方法。在确证性研究中，假设通常应基于公开发表的数据或早期试验的结果。对于待检测的处理间差异，可依据在患者管理中对具有临床相关性的最小效应的判断，也可依据对新处理的预期效应的判断，相比之下后者的预期效应更大。通常I类错误概率设在5%或者更小，或者由多重比较所需要的任何调整来决定；检验假设的事先合理性以及结果的预期影响可能会影响I类错误的精确选择。II类错误的概率通常设在10%到20%之间，申办方通常愿意让该值尽可能低，尤其当试验难以或不可能重复时。某些情况下，采用与常规的I类和II类错误水平不同的值也可能被接受，甚至更可取。

样本量应是主分析所需的受试者数量。如果这是“全分析集”，则效应大小的估计与符合方案集（见词汇表）相比，可能需要降低。这是因纳入了退出处理的或者依从性差的患者数据，而考虑稀释处理效应。相应地关于变异的假设可能也需要修改。

等效性或非劣效性试验（见第3.3.2章节）的样本量通常应基于获得处理间差异的置信区间的目的，该差异是指临床可接受的最大处理间差异。如果等效性试验的检验效能是在假设真实差异为0的条件下确定的，如果真实差异不为0，则达到这一检验效能所需的样本量会被低估。如果非劣效性试验的检验效能是在假设0差异的条件下确定的，如果试验产品的效应低于对照，则达到这一检验效能所需的样本量会被低估。“临床可接受的”差异的选择需要合理说明它对将来患者的意义，并且可能小于上文提到的优效性试验旨在证明的“临床相关的”差异。

成组序贯试验不能预先确定确切的样本量，因为它依赖于机会作用以及所选择的终止试验的准则和真实的处理间差异。终止准则的设计应该考虑后续样本量的分布，通常表达为预期样本量和最大样本量。

当事件率低于预期或变异大于预期时，在不揭盲数据或不进行处理间比较的情况下，可使用样本量重新估计的方法（见第4.4章节）。

3.6 数据采集及处理

数据的收集和研究者向申办方传输数据可通过各种媒介进行，包括纸质病例报告表、远程现场监查系统、医疗计算机系统和电子传输。无论采用何种数据收集工具，所收集信息的形式和内容都应完全符合方案，并应在临床试验实施前确定。应注重分析计划的实施所必须的数据，包括确认方案依从性或确定重要方案违背所需要的背景信息（如与服用剂量有关的时点评价）。 “缺失值”应该与“0值”或“特征缺失”区分开来。

从数据收集到数据库最终确定的过程应该按照 GCP 进行（见ICH E6，第5章节）。具体来说，需要及时可靠的程序用于记录数据和纠正错误与遗漏，以确保交付高质量的数据库，并通过实施计划的分析达到试验目的。

4. 试验实施的考虑

4.1 试验监查和期中分析

按照方案认真实施临床试验，对结果的可靠性具有重大影响（见ICH E6）。仔细监查可以确保尽早发现困难，并将它们的发生和复发减至最小。

由制药企业资助的确证性临床试验，通常有两种截然不同的监查类型。一种关注试验质量的监督，另一种涉及破盲以进行处理间的比较（即期中分析）。两种试验监查，除人员职责不同外，还涉及不同类型试验数据和信息的获取，因此需用不同的规则控制潜在的统计和操作偏倚。

出于监督试验质量的目的，试验监查中所涉及的检查可能包括：是否遵循方案，累积数据是否可接受，计划的收集目标是否达到，设计假设是否合适，以及在试验中保留患者是否成功，等等（见第4.2至4.4章节）。这种类型的监查既不需要获取比较处理效应的信息，也不需要对数据进行揭盲，因此对I类错误没有影响。出于这一目的对试验进行监查是申办方的职责（见ICH E6），可由申办方或申办方选择的独立小组来进行。这种类型的监查周期一般是从选择试验现场开始，到收集和清理最后一位受试者的数据结束。

其他类型的试验监查（期中分析）涉及到比较处理结果的累积。期中分析需要揭盲（即破盲）获取处理组分配信息（实际的处理分配或者各组分配的标识）以及比较处理组的汇总信息。这需要在方案（或者首次分析之前的适当修订）中包含期中分析的统计计划，以防止某些类型的偏倚，见第4.5 和 4.6 章节的讨论。

4.2 纳入与排除标准的更改

纳入与排除标准应按方案的规定保持恒定，贯穿受试者招募期。偶尔有些改变是允许的，例如，在长周期试验中，从试验外部或期中分析所获得的对医学知识新的认识，可能建议修改入组标准。监查人员发现违背入组标准情况经常发生，或者由于入组标准过严导致非常低的招募率，也都可能是修改入组标准的理由。修改入组标准应在不破盲的情况下进行，并通过方案修订进行描述，修订的方案应涵盖任何统计学方面的变动，如不同事件率所致的样本量调整，或者分析计划的修改，如根据修改的纳入/排除标准进行分层分析。

4.3 入组率

在受试者入组时间较长的试验中，应监查入组率，如果它明显低于预期水平，应该查明原因并采取补救措施，以确保试验的检验效能，并减轻对选择性入组和其他质量问题的担忧。这些考虑适用于多中心试验的各个中心。

4.4 样本量调整

在长周期试验中，通常有可能对原设计和样本量计算所依据的假设进行检查。如果试验设计的某些重要规定是根据初步的和/或不确定的信息做出的，这种检查尤其重要。对盲态数据进行期中检查可能会发现总应答的方差、事件率或生存状态不如预期。此时，可能需要通过适当修改假设来修正样本量，还应在方案修订和临床研究报告中说明其合理性并记录在案。应该解释为保持盲态所采取的措施及其对I类错误和置信区间宽度的影响（如果有）。只要可能，都应在方案中表述样本量再估计的潜在需要（见3.5章节）。

4.5 期中分析和提早终止试验

期中分析是指，在试验正式完成之前的任何时间，为比较处理组间的有效性或安全性而进行的任何分析。因为这些比较的次数、方法及结果影响试验的解释，因此所有期中分析都应当预先仔细计划并在方案中阐明。有些特殊情况，期中分析可能在试验开始后才发现有必要实施。对于这种情况，补充定义期中分析的方案修订应在分析数据揭盲之前。当期中分析用于决定是否终止试验时，通常会采用成组序贯设计，该设计以统计监查计划作为准则（见第3.4章节）。对于这种期中分析，出现以下情况可以提早终止试验：研究处理的优效性已被证实；相关处理间差异已被证实是不可能的；发生了不可接受的不良反应。一般来说，与安全性监查相比，通过有效性监查来提早终止试验要求更多的证据，即边界更保守。当试验设计和监查目的涉及多个终点时，应考虑多重性问题。

方案中应描述期中分析计划，或至少描述一些相关的考虑，如是否使用灵活的α消耗函数方法，并在第一次期中分析前，在修订的方案中提供进一步的细节。终止试验的准则和特性应在方案或修订的方案中清晰阐述。其他重要指标的分析对提早终止的潜在影响也应考虑。如果试验设有数据监查委员会（见第4.6章节），上述材料应由其撰写或批准。偏离计划总有可能使试验结果失效。如果试验需要修正，任何统计方面的相应修改应尽早在方案修订中详细说明，特别是讨论这些修改对任何分析或推断的影响。在统计方面应始终确保控制总I类错误概率。

期中分析的执行应该是一个完全保密的过程，因为可能涉及非盲的数据和结果。参与试验实施的所有人员应当对这些分析结果保持盲态，因为他们对试验的态度可能会改变并导致招募患者的特征改变或产生处理间比较的偏倚。除了直接参与执行期中分析的人员之外，这一原则可适用于所有研究人员和申办方所雇佣的人员。研究者应仅被告知继续或终止试验的决定，或实施修订试验程序的决定。

大部分支持研究产品有效性和安全性的临床试验应全部完成计划入组的样本量。只有出于伦理原因，或者出现检验效能不再可接受的情况，试验可提早终止。然而，人们都知道出于各种原因申办方的药物研发计划需要获取处理间比较的数据，如为其它试验制定计划；另外，仅有一部分试验会涉及到严重威胁生命的结局或死亡率的研究，出于伦理原因可能需要对入组病例的处理效应比较进行连续监查。无论是哪一种情况，为了应对可能引入的潜在统计偏倚和操作偏倚，应当在分析数据揭盲之前，在方案或修订方案中制定期中统计分析计划。

对于许多研究产品的临床试验，特别是那些具有重大公共卫生意义的临床试验，应将监查有效性和/或安全性结局比较的任务委托给外部独立团队，并清楚地描述其职责。通常将该团队称为独立数据监查委员会、数据和安全监查委员会或数据监查委员会。

当申办方充当监查有效性或安全性比较的角色并因此可以获取非盲的比较信息时，应特别注意保护试验的完整性，并适当地管理和限制信息共享。申办方应当确保并记录内部监查委员会遵守书面的标准操作规程，以及含有期中分析结果记录的决策会议纪要被维护。

任何没有恰当计划的期中分析（不论有或没有提早终止试验的影响）都可能导致试验结果的缺陷，并可能降低所得结论的可靠性，因此，应该避免这些分析。如果实施非计划的期中分析，临床研究报告应该解释其必要性，交待破盲的程度，评价所引入偏倚的潜在程度和对结果解释的影响。

4.6 独立数据监查委员会（IDMC）的作用（见ICH E6第1.25和5.52章节）

独立数据监查委员会可由申办方组建，每隔一段时间评价临床试验进展、安全性数据和关键有效性指标，并向申办方建议继续、修改或终止试验。该委员会应当有书面的操作规程，并保存所有会议记录，包括期中分析结果；当试验完成时，这些应可供审查。该委员会的独立性旨在控制重要的比较信息的分享，防止临床试验的完整性受到因获取试验信息而造成的不利影响。该委员会是独立于机构审查委员会或独立伦理委员会的实体，它的组成应包括通晓统计学等相关学科的临床试验科学家。

当独立数据监查委员会中有申办方代表时，在委员会的操作规程中应明确规定他们的作用（例如，他们是否能就关键问题进行投票）。由于这些申办方人员将会获得非盲信息，因此这些操作规程还应解决如何控制期中试验结果在申办方组织内散布。

5. 数据分析的考虑

5.1 分析的预先确定

当设计一个临床试验时，数据的最终统计分析的主要特征应该在方案的统计章节进行描述。该章节应包括所提出的主要指标确证性分析的所有主要特征以及解决预期分析问题的方法。对于探索性试验，该章节可描述更一般性的原则和方向。

统计分析计划（见词汇表）可作为独立文件撰写，并在最终确定方案之后完成。该文件可以更加技术性地和详细地阐述方案所述的主要特征（见第7.1章节）。该计划可包括对主要和次要指标以及其他数据进行统计分析的详细程序。统计分析计划应经审核或根据数据盲态审核（见第7.1章节定义）结果更新后，在揭盲前最终确定。最终统计分析计划的确定及随后的揭盲应保留正式记录。

如果盲态审核建议修改方案中所述的主要特征，应记录在修订方案中。否则，根据盲态审核建议考虑更新统计分析计划就足够了。只有方案（包括修订方案）中预设的分析才被认为是确证性的。

在临床研究报告的统计章节中，应该清楚地描述所采用的统计方法，包括临床试验过程中何时做出的方法学决策（见ICH E3）。

5.2 分析集

数据纳入主分析的受试者集应在方案的统计章节进行定义。另外，对试验程序（如导入期）启动的所有受试者进行文档记录可能是有用的。该受试者文档的内容取决于特定试验的详细特征，只要可能，至少应收集人口统计学和疾病状态的基线数据。

如果所有随机入组的受试者都满足全部入组标准，完全遵从所有试验程序且无失访，并能提供完整的数据记录，那么要纳入分析的受试者集是显而易见的。试验设计和实施的目标应该尽可能地接近这一理想状态，但实践中却难以达到这一状态。因此，方案的统计章节应该预先阐述可能影响受试者和分析数据的问题。方案还应该说明旨在减少研究实施中任何预期的且可能影响数据分析的不规则问题的程序，这些不规则问题包括各种类型的方案违背、退出和数据缺失。方案应考虑降低这些问题发生频率的方法以及如何解决数据分析中会发生的问题。在盲态审核期间，应确定针对方案违背分析方法可能的修订。最好是根据发生时间、原因及对试验结果的影响来确定任何重大方案违背。方案违背、数据缺失以及其它问题的发生频率和类型应记录在临床研究报告中，并描述它们对试验结果的潜在影响（见ICH E3）。

关于分析集的确定应遵循以下原则：1）使偏倚减到最小；2）避免I类错误膨胀。

5.2.1 全分析集

意向性治疗（见词汇表）原则是指主分析应包括所有随机化受试者。遵循该原则需要完成所有随机化受试者的随访以获得研究结局。实践中这一理想状态很难达到。在本文件中，术语“全分析集”被用来描述尽可能完整的分析集，即尽可能接近包括所有随机化受试者的意向性治疗的理想状态的分析集。在分析中保持初始随机化对于防止偏倚以及为统计检验提供可靠基础是很重要的。全分析集的使用为许多临床试验提供了一种保守策略。许多情况下，它也可以提供处理效应的估计，这些估计更有可能反映了后续临床实践中观察到的效应。

一些有限的情况可能导致将随机化受试者从全分析集中排除，包括未能满足主要入组标准（入选标准违背），未服用过至少一次试验药物以及缺乏随机化后的任何数据。这些排除应是合理的。只有在以下情况下，未能满足入组标准的受试者可从分析中排除而不会引入偏倚：

（1）在随机化之前评判了入组标准；

（2）入选标准违背可以被完全客观地评价；

（3）所有受试者都接受相同的入选标准违背审查；（在开放试验中或者甚至在双盲试验中，如果在审查之前数据被揭盲，相同的审查就很难保证，所以要强调盲态审核的重要性。）

（4）排除所有确定为特定入组标准违背者。

某些情况下，从所有随机化受试者集中排除任何未服用试验药物的受试者可能是合理的。例如，是否开始治疗的决定并不受已知晓所分配治疗的影响，即使排除了这些患者，但意向性治疗原则仍得以遵守。其他情况下，可能需要从所有随机化受试者集中剔除任何随机化后无数据的受试者，除非来自这些特定排除的潜在偏倚或任何其它偏倚得到解决，否则任何分析都不是完整的。

当使用受试者全分析集时，随机化后发生的方案违背可能会对数据和结论产生影响，特别是如果它们的发生与处理分配相关时。大多数情况下把这些受试者的数据纳入分析是合适的，这符合意向性治疗原则。接受一次或多次剂量后退出治疗且以后未提供数据的受试者，或失访的受试者，导致了特殊问题的产生，因为不把这些受试者纳入全分析集中可能会破坏这个原则。这种背景下，受试者无论因任何原因失访，其已经获得的、或根据方案中规定的评价时间点随后收集到的主要指标测量数据，都是有价值的。在主要指标是死亡率或严重疾病发病率的研究中，后续数据的收集尤为重要。如何收集此类数据应在方案中描述。从末次观察值结转方法到复杂数学模型的填补技术可尝试用于替代缺失值。用于确保全分析集中每个受试者主要指标测量值可利用的其它方法，可能会要求做出关于受试者结局或更简单的结局（如成功或失败）的一些假设。任何策略的使用都应在方案的统计章节中进行描述并说明合理性，并且所用的任何数学模型所依据的假设均应解释清楚。证实相应分析结果的稳健性也同样重要，特别是所考虑的策略本身可能会导致处理效应有偏估计的情况。

由于一些问题的不可预测性，有时把不规则问题应对方法的详细考虑推迟到试验结束对数据进行盲态审核时可能更可取，如果这样做则需要在方案中加以说明。

5.2.2 符合方案集

受试者的“符合方案”集，有时被称为“有效病例”、“有效性”样本或“可评价的受试者”样本，被定义为全分析集的受试者中对方案更具依从性的子集，并且以符合如下标准为特征：

（1）完成了对治疗方案的某个预先设定的最小暴露量；

（2）可以获得主要指标的测量值；

（3）无任何重大方案违背，包括入组标准违背。

在揭盲之前，应该按照适合于特定试验情况的方式完整定义并记录将受试者排除在符合方案集之外的确切原因。

使用符合方案集可能有最大的机会使新的治疗在分析中显示出额外的有效性，而且最紧密地反映方案中的科学模型。然而，相应的假设检验和处理效应估计可能保守也可能不保守，这取决于试验本身；对研究方案的依从性可能与处理和结局有关，它可能会导致偏倚甚至是严重的偏倚。

应充分识别和总结导致剔除受试者以生成符合方案集和其它方案违背的问题。相关的方案违背可能包括处理分配的错误、使用禁忌药物、依从性差、失访和数据缺失。从发生频率和发生时间方面评估各处理组间这些问题的模式是一种良好实践。

5.2.3 不同分析集的作用

一般说来，证明主要试验结果对选择不同受试者集具有不敏感性是有利的。在确证性试验中，计划对全分析集及符合方案集都进行分析通常是恰当的，这样可以明确地讨论和解释它们之间的任何差异。某些情况下，需要深入探讨用于分析的受试者集的选择对结论的敏感性。当全分析集和符合方案集得出实质上相同的结论时，会增加试验结果的可信度，但应注意，对于排除了大比例受试者的符合方案分析会给试验的整体正确性带来一些疑虑。

在优效性试验（试图验证研究产品更优）和等效性或非劣效性试验（试图验证研究产品具有可比性，见第 3.3.2 章节）中，全分析集和符合方案集发挥的作用不同。在优效性试验中，全分析集用于主分析（除了例外情况），因为它倾向于避免符合分析集所导致的对有效性的过度乐观估计，因为包含在全分析集中的非依从者一般会降低所估计的处理效应。然而，在等效性或非劣效性试验中，使用全分析集一般不保守，应非常仔细地考虑它的作用。

5.3 缺失值及离群值

缺失数据是临床试验中的一个潜在偏倚来源。因此，应尽一切努力满足方案对数据收集和管理的所有要求。然而，现实中几乎总会有一些缺失数据。虽然如此，只要缺失数据的处理方法合理，尤其是在方案中预先定义了这些方法，则试验可以被认为是可靠的。在盲态审核期间，可以更新统计分析计划，完善这些方法的定义。遗憾的是，没有可推荐的普遍适用的缺失数据处理方法。应该对缺失数据的处理方法做敏感性研究，特别是当缺失数据的比例较大时。

应采用类似的方法探索离群值的影响，它们的统计定义在某种程度上是主观的。只有从医学上和统计上都认为是合理的，把某一特定值明确地确定为异常值才最具说服力，而且医学方面通常会定义适当的操作程序。在方案或统计分析计划中预先设定的有关离群值的程序应当不倾向任何处理组。同样，在盲态审核期间可以有效地更新这方面的分析。如果在试验方案中未预先规定应对离群值的程序，则需要在对实际值做一次分析的同时，至少进行一次排除或减少离群值效应的分析，并讨论它们的结果之间的差异。

5.4 数据转换

最好在试验设计期间基于早期临床试验的类似数据，在分析前做出对关键指标进行转换的决定。应该在方案中对数据转换（如平方根转换、对数转换）进行详细说明，并叙述基本原理，尤其是主要指标。在标准教材中可以找到进行数据转换的一般原则，可确保满足统计方法所依据的假设，而且在许多特定的临床领域已经形成了针对特定指标的惯例。是否以及如何对指标进行转换的决定应该受到对于刻度喜好的影响，以便于临床解释。

类似的考虑也适用于其他衍生指标，例如，自基线变化值、自基线变化百分比、重复测量的“曲线下面积”或两个不同指标的比值。应仔细考虑后续的临床解释，并在方案中说明衍生的合理性。与此密切相关的要点参见第2.2.2章节。

5.5 估计、置信区间及假设检验

为满足试验的主要目的，应该在方案的统计章节中详细说明待检验的假设和/或待估计的处理效应。用于完成这些任务的统计方法应当针对主要指标（以及优选的次要指标）进行描述，并明确所依据的统计模型。只要有可能，处理效应的估计应伴有置信区间，并确定其计算方法。应当说明使用基线数据以提高精度或以潜在基线差异校正估计值的任何意图，例如，使用协方差分析进行校正。

重要的是，要阐述清楚将使用单侧还是双侧统计检验，如果使用单侧检验一定要事先充分说明其合理性。如果认为假设检验不适用，那么应该给出获得统计结论的替代过程。关于单侧或双侧推断方法的问题是有争议的，在统计文献中可以找到各种各样的观点。在监管背景下，更可取的方法是将单侧检验的I类错误设置为双侧检验中使用的传统I类错误的一半，这样就保持了与双侧置信区间的一致性。双侧置信区间通常适合于估计两种处理间差异的可能大小。

所选择的特定统计模型应当反映人们对待分析指标以及试验的统计设计在医学和统计方面的目前认识状态。应充分说明在分析中待拟合的所有效应（例如在方差模型分析中），并应解释根据初步结果对这些效应进行修改的方式（如果有）。同样的考虑也适用于在协方差分析中所拟合的协变量集合（见第5.7章节）。在选择统计方法时（如参数和非参数方法），应注意主要和次要指标的统计分布，其分析结果应包含处理效应量的统计估计值及置信区间（显著性检验除外）。

应当清楚地区分主要指标的主分析与主要或次要指标的支持性分析。在方案的统计章节或统计分析计划中，除主要和次要指标外还应阐明数据的汇总和报告方式的大纲。为了在一系列试验中实现分析一致性的目的，例如对于安全数据，应当包括所采用方法的介绍。

对于已知的药理学参数、单个受试者的方案依从程度或其它生物学基础数据，整合这些信息的建模方法可以洞察实际或潜在有效性的价值，特别是对于处理效应的估计。应始终清晰地确定这些模型所依据的假设，并仔细描述任何结论的局限性。

5.6 显著性及置信水准的调整

当存在多重性时，用于临床试验数据分析常用的频率派方法可能需要对I类错误进行调整。多重性可能来源于多个主要指标（见第2.2.2章节）、处理的多重比较、随时间的多次评价和/或期中分析（见第4.5章节）。在可行的情况下，避免或减少多重性的方法有时更可取，例如，在多个指标中确定一个关键主要指标，在多重比较中选择一个关键的处理比较，对于重复测量使用汇总测量如“曲线下面积”等。在确证性分析中，除采取此类步骤，对多重性的其余任何解决办法也应当在方案中确定。应始终考虑多重性的调整，并应在分析计划中交待任何调整程序的细节，或者解释不必调整的理由。

5.7 亚组、交互作用及协变量

除处理之外，主要指标通常系统性地与其它影响因素相关。例如，它可能与年龄和性别等协变量相关，或者比如多中心试验中不同中心接受处理的受试者这样的特定亚组之间可能存在差异。有些情况下，对协变量影响的调整或者对亚组效应的调整是分析计划中不可缺少的部分，因此应在方案中阐明。应通过试验前的缜密考虑，确定这些协变量以及预期对主要指标有重要影响的因素，并考虑在分析中如何处理，以提高精度和补偿处理组之间的任何不平衡。如果使用一个或多个因素进行分层设计，那么在分析中应考虑这些因素。当不确定调整的潜在价值时，通常建议主要关注未调整的分析，把调整分析作为支持性分析。应特别注意中心效应和主要指标基线值的作用。不建议在主分析中校正随机化后测量的协变量，因为它们可能受到处理的影响。

处理效应本身也可能随亚组或协变量而变化，例如，处理效应可能随年龄降低或者可能在特定诊断类别的受试者中更大。某些情况下，预期会产生交互作用或对交互作用有特别兴趣（如老年病学）时，亚组分析或者包含交互项的统计模型因此成为计划的确证性分析的一部分。然而，大多数情况下亚组分析和交互作用分析应当确定为探索性的，即探索所有处理效应的一致性。一般而言，应首先在所讨论的统计模型添加交互项进行分析，辅之以在相关受试者亚组内或者由协变量定义的层内进行额外的探索性分析。对于探索性分析，应谨慎解释其分析结果，仅仅基于探索性亚组分析的治疗有效性（或缺乏有效性）或安全性的任何结论都不太可能被接受。

5.8 数据的完整性与计算机软件的可靠性

分析结果的可信性取决于用于数据管理（数据录入、存储、验证、校正和检索）以及在统计上处理数据的方法和软件（内部和外部编写）的质量和可靠性。因此，数据管理活动应当基于全面和有效的标准操作规程。用于数据管理和统计分析的计算机软件应当是可靠的，并应提供适当的软件测试过程的文件。

6. 安全性与耐受性评价

6.1 评价的范围

在所有临床试验中，安全性和耐受性（见词汇表）的评价是一个重要方面。在早期阶段，这种评价主要是探索性的，并且只对毒性的直接表达敏感，而在后期阶段，可在更大样本量的受试者中更加全面地描述药物的安全性和耐受性特征。后期阶段的对照试验代表了以无偏的方式探索任何新的潜在不良反应的重要方法，即使这些试验在这方面通常缺乏检验效能。

某些试验可针对以安全性和耐受性的优效性或等效性（与其它药物或与研究药物的其它剂量相比）的特定主张为目的进行设计。这些特定主张需得到来自确证性试验的相关证据支持，就像相应的有效性主张需要证据支持一样。

6.2 指标选择与数据收集

在任何临床试验中，选择用于评价药物安全性和耐受性的方法和测量取决于许多因素，包括对与药物密切相关的不良反应的了解，来自非临床和早期临床研究的信息以及特定药物的药效/药代动力学特性的可能结果、给药方式、待研究的受试者类型，以及试验持续时间。有关临床化学和血液学、生命体征、临床不良事件（疾病、体征和症状）的实验室检查通常构成安全性和耐受性数据的主体。发生严重不良事件以及因不良事件导致治疗终止对于注册是特别重要的（见ICH E2A和ICH E3）。

此外，建议在整个临床试验规划中采用一致的方法来收集和评价数据，以便合并来自不同试验的数据。使用通用的不良事件词典尤为重要。该词典具有一种结构，提供了在三个不同层级上汇总不良事件数据的可能性，即系统-器官分类、首选术语和收录术语（见词汇表）。首选术语通常是汇总不良事件的层级，在数据的描述性展示中，可以汇集属于同一系统-器官分类的首选术语（见ICH M1）。

6.3 待评价受试者集及数据展示

对于整体安全性和耐受性评价，待汇总的受试者集通常被定义为那些接受至少一个剂量研究药物的受试者。应尽可能全面地从这些受试者中收集安全性和耐受性指标，包括不良事件类型、严重程度、发病和持续时间（见ICH E2B）。可能需要在特定的亚组人群，如女性、老年人（见ICH E7）、严重疾病或那些有常见伴随治疗的人群，进行额外的安全性及耐受性评价。这些评价可能需要解决更加特殊的问题（见ICH E3）。

在评价过程中需要注意所有安全性和耐受性指标，并且在方案中应阐明方法。所有不良事件都应报告，无论它们是否被认为与治疗有关。在评价中应当考虑研究人群中的所有可用数据。应当谨慎地定义测量值的单位和实验室指标的参考范围，如果在同一试验中出现不同的单位或不同的参考范围（例如涉及一个以上的实验室），则测量值应当被适当标准化，以便统一评价。应预先确定毒性分级量表的使用，并说明合理性。

某种不良事件的发生率通常以经历事件的受试者数量与处于风险中的受试者数量之比来表示。然而，如何评价发生率并不总是显而易见的，例如，根据情况可考虑把暴露的受试者数量或暴露程度（用人年表示）作为分母。无论计算的目的是估计风险还是在处理组之间进行比较，重要的是要在方案中给出定义。如果计划进行长周期治疗，并预期有相当比例的退出治疗或死亡，这一点尤其重要。对于这些情况，应考虑生存分析方法，并计算累积不良事件率，以避免低估的危险。

对于存在大量体征和症状的背景噪声的情况（如精神病试验），在估计不同不良事件的风险时，应考虑对此进行解释的方法。一种方法是利用“治疗引发事件”（见词汇表）的概念，即只有当不良事件出现或相对于治疗前基线发生恶化时，才记录它们。

减少背景噪声影响的其他方法也许是合适的，如忽略轻度不良事件，或再次随访时观察到的事件才可计入分子。这些方法应在方案中解释并说明其合理性。

6.4 统计评价

安全性与耐受性的研究是一个多维问题。对于任何药物，虽然通常可以预见和监测到某些特定不良反应，但由于可能的不良反应范围非常大，新的和不可预见的反应总可能出现。此外，在违背方案之后经历的不良事件可能引入偏倚，如使用违禁药物。这个背景使药物安全性和耐受性的统计分析和评价变得困难，并且意味着来自确证性临床试验的结论性信息是一种例外而不是通例。

大多数试验中，应用数据的统计描述方法，辅以有助于解释的置信区间计算，是说明安全性和耐受性的最好方法。利用图示方法表达处理组间和受试者间不良事件的模式也有价值。

计算P值有时是有意义的，无论作为评价有关特定差异的辅助手段，还是作为“标记”符号以引起对大量安全性与耐受性指标所出现差异的进一步关注。这对于实验室数据尤其有用，否则可能难以适当地进行汇总。建议对实验室数据既要进行定量分析，如对处理组均数的评价，又要进行定性分析，如计算高于或低于某些阈值的比例。

如果使用假设检验，对多重性的统计调整以量化I类错误是合适的，但是II类错误通常更值得关注。如果未做多重性调整，应谨慎解释常规的统计显著性。

大多数试验中，与阳性对照药物或安慰剂相比，研究者会试图确定未出现临床上不可接受的安全性及耐受性方面的差异。与有效性的非劣性或等效性评价一样，这种情况下使用置信区间比假设检验更可取，因为置信区间往往可以清楚地显示由低发生率所引起的精度变差。

6.5 综合性总结

在研究产品的开发过程中，特别是在上市申请时，通常会将不同试验的药物安全性与耐受性的特性进行汇总。然而，这样汇总是否可用取决于每一个具有高数据质量的、充分和控制良好的试验。

药物的总体可用性始终是风险与获益之间的平衡问题，单个试验中也可考虑这一观点，即使风险/获益评估通常在整个临床试验的总结阶段进行。（见第7.2.2章节）

有关安全性与耐受性报告的更多细节，见ICH E3第12章。

7. 报告

7.1 评价与报告

如引言所述，临床研究报告的结构与内容是ICH E3的主题。该ICH指南充分地涵盖了统计工作报告并适当整合临床和其它资料，本章节因此相对简短。

如第5章节所述，在试验的计划阶段，分析的主要特征应在方案中确定。当试验结束而且数据经整理可供初步检查时，如第5章节提到的按计划进行盲态审核是有价值的。在分析前盲态审核应当包括相关决定，例如，从分析集中排除受试者或数据，可能的数据转换的核查，离群值的定义，将近期其它研究中确定的重要协变量加入模型，参数或非参数方法的重新考虑，等等。这些决定应在报告中加以描述，而且应当与统计师获得处理编码之后做出的决定加以区别，因为盲态下的决定通常会减少产生偏倚的可能性。参与非盲期中分析的统计师或其他人员不应参与盲态审核或修订统计分析计划。数据中如果存在明显的处理诱导效应的可能，将会削弱盲态效果，此时，盲态审核需要特别谨慎。

许多更详细的报告内容和表格应在盲态审核时或盲态审核前完成，以便在实际分析时有一个包括各方面的完整计划，如受试者选择、数据选择与修改、数据汇总与列表、估计与假设检验等。一旦完成数据验证，应按照预先拟定的计划进行分析，越依从于这些计划，结果的可信度越高。应特别注意在方案、方案修订以及基于数据盲态审核更新的统计分析计划中所描述的计划分析与实际分析之间的任何差异。应对偏离计划的分析做出详细解释。

进入试验的所有受试者，无论是否纳入分析，都应在报告中说明。排除在分析之外的所有原因都应记录，还应记录受试者被纳入全分析集但未被纳入符合方案集的原因。类似地，对于纳入分析集的所有受试者，所有重要指标的测量值在所有相关时间点都应该进行说明。

应仔细考虑受试者或数据的所有缺失、退出治疗和重要方案违背对主要指标的主分析的影响。应确定失访、退出治疗或严重方案违背的受试者，并对他们进行描述性分析，包括他们缺失的原因及其与处理和结局的关系。

描述性统计是报告不可缺少的部分。合适的表格和/或图示应清楚地说明主要和次要指标、关键预后指标和人口统计学指标的重要特征。应特别仔细地描述与试验目的有关的主分析的结果。当报告显著性检验结果时，应报告精确的P值（如“P=0.034”）而不是参考临界值。

尽管临床试验分析的主要目标是回答其主要目的提出的问题, 但在非盲分析过程中，基于观察数据的新问题很可能会出现，随之可能需要额外的或许复杂的统计分析。报告中应严格区分这种额外工作与方案中计划的工作。

对于计划分析中未被预先定义为协变量但仍然具有某些预后重要性的基线测量，机会作用可能会导致它们在处理组间出现无法预料的不均衡。最好的解决办法是，证明针对这些不均衡进行校正的补充分析得出了与计划分析基本相同的结论。否则，应讨论这种不均衡对结论的影响。

一般而言，应少用计划外分析。当认为处理效应可能随某个或某些其他因素而变化时，会用到计划外分析，比如会尝试确定特别获益的受试者亚组。众所周知，计划外亚组分析有过度解释的潜在风险（见第5.7章节），应谨慎避免。虽然当受试者亚组中未显示出获益或具有不良反应时会出现类似的解释问题，但应该恰当地评价这些可能性并予以报告。

最后，应根据临床试验结果的分析、解释及展示做出统计判断。为此，试验统计师应是负责临床研究报告的小组成员之一，还应批准临床报告。

7.2 临床数据库的总结

上市申请需要对所有报告临床试验的安全性和有效性证据进行全面总结和综合（欧盟的专家报告、美国的综合总结报告、日本的概要），在适当的时候还可能伴随结果的统计汇总。

总结中有一些特定的统计关注的领域：描述在临床试验项目过程中受试人群的人口统计学和临床特征；通过考虑相关（通常有对照组）试验的结果并强调它们相互印证或矛盾的程度来解决有效性的关键问题；对于其结果有助于上市申请的所有试验，总结从它们的合并数据库中可获得的安全信息，并确定潜在的安全问题。在设计临床项目中，应认真关注测量的统一定义和收集，这将有助于随后一系列试验的解释，特别是如果不同试验之间的测量可能被合并时。应该选择和使用可记录用药细节、病史和不良事件的通用词典。对主要和次要指标采用通用定义几乎总是有价值的，这对meta分析极为重要。关键有效性指标的测量方式、相对于随机化/入组的评价时机、方案违背和偏离的应对以及可能的预后因素定义都应该保持一致，除非有合适的理由不这么做。

应当详细描述用于不同试验之间数据合并的任何统计程序。应注意与试验选择有关的偏倚的可能性、试验结果的同质性、以及各种变异来源的恰当建模。应探索结论对假设和选择的敏感性。

7.2.1 有效性数据

单个临床试验的样本量应该总是大到足以满足其目的的程度。通过总结一系列解决基本相同的关键有效性问题的临床试验，也可以获得额外的有价值的信息。为了便于比较，应该以相同的形式，通常是关注于估计值和置信限的表格和图形，呈现一系列试验的主要结果。使用meta分析技术来合并这些估计值常常是一个有用的补充，因为它允许对处理效应量生成更精确的总体估计，并提供完整而简明的试验结果总结。在一些特殊情况下，meta分析方法也可能是通过整体假设检验提供充分的有效性整体证据的最适当方式，或者唯一方式。当用于此目的时，meta分析应该有它自己的前瞻性书面方案。

7.2.2 安全性数据

在总结安全性数据时，重要的是要彻底检查安全性数据库，以寻找潜在毒性的任何迹象，并通过寻找相关的支持性观察模式来跟踪这些迹象。将人暴露于药物的所有安全数据进行合并，能提供重要的信息来源，因为较大的样本量能提供发现更罕见不良事件的最佳机会，并且可能提供估计罕见不良事件近似发生率的最佳机会。然而，由于缺乏对照组，难以评价来自该数据库的发生率数据，来自对照试验的数据在克服这种困难方面特别有价值。应合并具有相同对照组（安慰剂或特定阳性对照）的研究的结果，并分别展示每个提供充足数据的对照组的结果。

所有通过数据探索发现的潜在毒性的迹象都应报告。评价这些潜在不良反应的现实情况应考虑到由于多次比较而产生的多重性问题。还应适当地使用生存分析方法进行评价，以探索不良事件的发生率与暴露时间和/或随访时间的潜在关系。应适当地量化确定的不良反应的风险，以便正确评价风险/获益关系。

词汇表

Glossary	Content
贝叶斯方法	是指为某些参数（如处理效应）提供后验概率分布的数据分析方法。后验概率分布由该参数的观测数据和先验概率分布衍生而来，被用作统计推断的基础。
偏倚（统计的和操作的）	是指与临床试验的设计、实施、分析和结果评价有关的任何因素导致的处理效应估计值偏离其真实值的系统趋势。由实施偏离所引入的偏倚称为“操作”偏倚，而上述其他来源的偏倚称为“统计”偏倚。
盲态审核	是指在试验完成（最后一位受试者的最后一次观察）到揭盲这段时间内对数据的检查和评价，旨在最终确定分析计划。
内容效度	是指一个指标（如量表）测量它所预期测量的内容的程度。
双模拟	是指在临床试验中当两种处理不能做到完全相同时，使处理实施仍能保持盲态的一种技术。先准备处理A（阳性药和不能区分的安慰剂）和处理B（阳性药和不能区分的安慰剂），然后受试者接受两套处理：A（阳性药）和B（安慰剂）或者A（安慰剂）和B（阳性药）。
脱落	是指临床试验的受试者由于任何原因不能继续按研究方案进行到所要求的最后一次随访。
等效性试验	是指主要目的为证实两种或多种处理的应答差别无重要临床意义的试验。通常以真实的处理间差异落在临床上可接受的等效性界值上下限之间来证实等效性。
频率派方法	是指在假设重现相同实验情境时，用某些结局的发生频率做出解释的统计方法，例如显著性检验和置信区间。
全分析集	是指尽可能接近符合意向性治疗原则的理想的受试者集。该数据集是从所有随机化的受试者中以最少的和合理的方法排除受试者后得到的。
可推论性，推论	是指将临床试验的发现从参与试验的受试者可靠地外推到更广泛的患者人群和临床环境的程度。
全局评价指标	是指将客观指标和研究者对受试者的状态或状态变化的总体印象综合起来所设定的一个单一指标，通常是一个有序分类量表。
独立数据监查委员会（数据和安全监查委员会、监查委员会、数据监查委员会）	独立数据监查委员会由申办方设立，职责是定期评价临床试验进度、安全性数据以及关键有效性终点，并向申办方建议是否继续、修改或终止试验。
意向性治疗原则	是指基于受试者的治疗意向（即计划的治疗方案）而不是实际给予的治疗进行评价的原则，该原则可以对治疗策略的效应做出最佳评价。它的结果是，分配到每一个处理组的受试者即应作为该组的成员被随访、评价和分析，无论他们是否依从于所计划的治疗过程。
交互作用（定性和定量）	是指处理间的比较（如研究产品与对照之间的差异）依赖于另一因素（如中心）的情况。定量交互作用是指该因素的不同水平之间在量的比较上有差异，而定性交互作用是指比较结果至少在该因素某一水平上显示方向不同。
评价者间信度	是指不同评价者在不同场合使用评价工具时产生相同结果的可靠程度。
评价者内信度	是指同一评价者在不同场合使用评价工具时产生相同结果的可靠程度。
期中分析	是指正式完成临床试验前，比较处理组间的有效性或安全性所做的任何分析。
Meta 分析	是指来源于针对同一个问题的两个或多个试验的量化证据的规范评价，常见的方法是将各试验的汇总统计量进行统计合并，有时也采用原始数据的统计合并方法。
多中心试验	是指多个研究者在多个场所按同一个方案实施的临床试验。
非劣效性试验	是指主要目的为验证研究产品的应答在临床上不劣于对照（阳性药或安慰剂对照）的试验。
首选术语和收录术语	在分层级医学词典中，例如MedDRA，收录术语是词典术语的最低层级，以研究者的描述进行编码。首选术语是收录术语的分组层级，通常用于报告发生率。例如，研究者写的是“左臂疼痛”，收录术语编码为：“关节疼痛”，在首选术语层级上报告为“关节痛”。
符合方案集（有效病例，有效性样本，可评价的受试者样本）	是指由充分依从于方案的受试者子集所产生的数据集，以确保这些数据按照所依据的科学模型可能展现出处理效应。依从性包括以下一些考虑：暴露于处理、可获得测量值以及无重大方案违背等。
安全性和耐受性	医疗产品的安全性是指受试者的医学风险，通常在临床试验中由实验室检查（包括临床生化和血液学）、生命体征、临床不良事件（疾病、体征和症状），以及其他特殊的安全性检查（如心电图、眼科检查）等来评价。医疗产品的耐受性是指受试者能耐受明显不良反应的程度。/td>
统计分析计划	是指更技术性地和更详细地阐述方案中描述的分析要点的文件，包括对主要和次要指标及其他数据进行统计分析的详细程序。
优效性试验	是指主要目的为显示研究产品的应答优于对照（阳性药或安慰剂对照）的试验。
替代指标	是指在直接测量临床效应不可行或不实际的情况下，用于间接测量临床效应的指标。
处理效应	是指在临床试验中归因于处理的效应。在大多数临床试验中，感兴趣的处理效应通过两个或多个处理间的比较体现。
治疗引发事件	是指出现在治疗期间的、但在治疗前未曾发生或比治疗前明显恶化的事件。
试验统计师	是指同时具备丰富的教育/训练和经验，可以实施本指南中的原则并负责临床试验统计方面的统计师。

ICH E9

ICH E9(R1) Addendum: Statistical Principles for Clinical Trials

中文版

ICH E9(R1) 临床试验中的估计目标与敏感性分析（E9指导原则增补文件）

A.1. PURPOSE AND SCOPE

To properly inform decision making by pharmaceutical companies, regulators, patients, physicians and other stakeholders, clear descriptions of the benefits and risks of a treatment (medicine) for a given medical condition should be made available. Without such clarity, there is a concern that the reported “treatment effect” will be misunderstood. This addendum presents a structured framework to strengthen the dialogue between disciplines involved in the formulation of clinical trial objectives, design, conduct, analysis and interpretation, as well as between sponsor and regulator regarding the treatment effect(s) of interest that a clinical trial should address.

Precision in describing a treatment effect of interest is facilitated by constructing the “estimand” (see Glossary; A.3.) corresponding to a clinical question of interest. Clarity requires a thoughtful envisioning of “intercurrent events” (see Glossary; A.3.1.) such as discontinuation of assigned treatment, use of an additional or alternative treatment and terminal events such as death. The description of an estimand should reflect the clinical question of interest in respect of these intercurrent events, and this addendum introduces strategies to reflect different questions of interest that might be posed. The choice of strategies can influence how more conventional attributes of a trial are reflected when describing the clinical question, for example the treatments, population or the variable (endpoint) of interest.

The statistical analysis of clinical trial data should be aligned to the estimand. This addendum clarifies the role of “sensitivity analysis” (see Glossary) to explore robustness of conclusions from the main statistical analysis.

Throughout the addendum, references to the original ICH E9 are made using x.y. References within this addendum are made using A.x.y.

This addendum clarifies and extends ICH E9 in respect of the following topics. Firstly, ICH E9 introduced the Intention-To-Treat (ITT) principle in connection with the effect of a treatment policy in a randomised controlled trial, whereby subjects are followed, assessed and analysed irrespective of their compliance to the planned course of treatment, indicating that preservation of randomisation provides a secure foundation for statistical tests. Multiple consequences arising from the ITT principle can be distinguished. Firstly, that the trial analysis should include all subjects relevant for the research question. Secondly, that subjects should be included in the analysis as randomised. Taken directly from the definition of the ITT principle (see ICH E9 Glossary), a third consequence is that subjects should be followed-up and assessed regardless of adherence to the planned course of treatment and that those assessments should be used in the analysis. It remains undisputed that randomisation is a cornerstone of controlled clinical trials and that analysis should aim at exploiting the advantages of randomisation to the greatest extent possible. However, the question remains whether estimating an effect in accordance with the ITT principle always represents the treatment effect of greatest relevance to regulatory and clinical decision making. The framework outlined in this addendum gives a basis for describing different treatment effects and some points to consider for the design and analysis of trials to give estimates of these treatment effects that are reliable for decision making.

Secondly, issues considered generally under data handling and “missing data” (see Glossary) are re-visited. Two important distinctions are made. Firstly, the addendum distinguishes discontinuation of randomised treatment from study withdrawal. The former represents an intercurrent event, to be addressed in the precise specification of the trial objective through the estimand. The latter gives rise to missing data to be addressed in the statistical analysis. Consider, for example, a subject switching treatments in an oncology trial, and a subject for whom no outcome event can be observed because the trial is completed. The former represents an intercurrent event and the clinical question of interest in respect of that should be clear. The latter is administrative censoring which needs to be addressed as a missing data problem in the statistical analysis. Having clarity in the estimand gives a basis for planning which data need to be collected and hence which data, when not collected, present a missing data problem to be addressed in the statistical analysis. In turn, methods to address the problem presented by missing data can be selected to align with the estimand. Secondly, the addendum highlights the distinct consequences of different intercurrent events. Events such as discontinuation of treatment, switching between treatments, or use of an additional medication may render the later measurements of the variable irrelevant or difficult to interpret even when they can be collected. Measurements after a subject dies do not exist.

Thirdly, issues related to the concept of analysis sets are considered in the framework. Section 5.2. strongly recommends that analysis of superiority trials be based on the full analysis set, defined to be as close as possible to including all randomised subjects. However, trials often include repeated measurements on the same subject. Elimination of some planned measurements on some subjects, perhaps because the measurement is considered irrelevant or difficult to interpret, can have similar consequences to excluding subjects altogether from the full analysis set, i.e. that the initial randomisation is not fully preserved. A consequence of this is that the theoretical benefits that randomisation confers on testing hypotheses about treatment effects and the practical benefits of balancing confounding factors at baseline can be diminished. In addition, a meaningful value of the outcome variable might not exist, as when the subject dies. Section 5.2. does not directly address these issues. Clarity is introduced by carefully defining the treatment effect of interest in a way that determines both the population of subjects to be included in the estimation of that treatment effect and the observations from each subject to be included in the analysis considering the occurrence of intercurrent events. The meaning and role of an analysis of the per protocol set is also re-visited in this addendum; in particular whether the need to explore the impact of protocol violations and deviations can be addressed in a way that is less biased and more interpretable than naïve analysis of the per protocol set.

Finally, the concept of robustness (see 1.2.) is given expanded discussion under the heading of sensitivity analysis. A distinction is made between the sensitivity of inference to the assumptions of a chosen method of analysis and the sensitivity to the choice of analytic approach more broadly. With precise specification of an agreed estimand and a method of analysis that is both aligned to the estimand and pre-specified to a level of detail that it can be replicated precisely by a third party, regulatory interest can focus on sensitivity to deviations from assumptions and limitations in the data in respect of a particular analysis.

The principles outlined in this addendum are relevant whenever a treatment effect is estimated, or a hypothesis related to a treatment effect is tested, whether related to efficacy or safety. While the main focus is on randomised clinical trials, the principles are also applicable for single arm trials and observational studies. The framework applies to any data type, including longitudinal, time-to-first event, and recurrent event data. Regulatory interest in the application of the principles outlined will be greater for confirmatory clinical trials and, where used to generate confirmatory conclusions, for data integrated across trials.

A.2. A FRAMEWORK TO ALIGN PLANNING, DESIGN, CONDUCT, ANALYSIS AND INTERPRETATION

Trial planning should proceed in sequence (Figure 1). Clear trial objectives should be translated into key clinical questions of interest by defining suitable estimands. An estimand defines the target of estimation for a particular trial objective (i.e. “what is to be estimated”, see A.3.). A suitable method of estimation (i.e. the analytic approach, referred to as the main “estimator”, see Glossary) can then be selected (see A.5.1.). The main estimator will be underpinned by certain assumptions. To explore the robustness of inferences from the main estimator to deviations from its underlying assumptions, a sensitivity analysis should be conducted, in the form of one or more analyses, targeting the same estimand (see A.5.2.).

Figure 1: Aligning target of estimation, method of estimation, and sensitivity analysis, for a given trial objective

This framework enables proper trial planning that clearly distinguishes between the target of estimation (trial objective, estimand), the method of estimation (estimator), the numerical result (“estimate”, see Glossary), and a sensitivity analysis. This will assist sponsors in planning trials, regulators in their reviews, and will enhance the interactions between these parties when discussing the suitability of clinical trial designs, and the interpretation of clinical trial results.

The specification of appropriate estimands (see A.3.) will usually be the main determinant for aspects of trial design, conduct (see A.4.) and analysis (see A.5.).

A.3. ESTIMANDS

Central questions for drug development and licensing are to establish the existence, and to estimate the magnitude, of treatment effects: how the outcome of treatment compares to what would have happened to the same subjects under alternative treatment (i.e. had they not received the treatment, or had they received a different treatment). An estimand is a precise description of the treatment effect reflecting the clinical question posed by a given clinical trial objective. It summarises at a population level what the outcomes would be in the same patients under different treatment conditions being compared. The targets of estimation are to be defined in advance of a clinical trial. Once defined, a trial can be designed to enable reliable estimation of the targeted treatment effect.

The description of an estimand involves precise specifications of certain attributes, which should be developed based not only on clinical considerations but also on how intercurrent events are reflected in the clinical question of interest. Section A.3.1. introduces intercurrent events. Section A.3.2. introduces strategies to describe the question of interest in respect of intercurrent events. Section A.3.3. describes the attributes of an estimand and Section A.3.4. gives considerations for its construction. It is critically important to understand the differences between the strategies and to precisely articulate which are used in constructing the estimand.

A.3.1. Intercurrent Events to be Reflected in the Clinical Question of Interest

Intercurrent events are events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest. It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated.

Intercurrent events need to be considered in the description of a treatment effect because measurements of the variable can be influenced by the intercurrent event and the occurrence of the intercurrent event may depend on treatment. For example, two patients might be exposed initially to the same treatment and provide the same measure of outcome, but if one patient has received additional medication, the information that the two measures give about the treatment differs between the two patients. Furthermore, whether a patient needs to take additional medication, and whether or not a patient can continue taking treatment, may depend on the treatment to which they are exposed. Unlike missing data, intercurrent events are not to be thought of as a drawback to be avoided in clinical trials. Discontinuation of prescribed treatment, use of additional medication, and other such events may occur in clinical practice as they do in clinical trials, and their occurrence needs to be considered explicitly when defining the clinical question of interest.

Examples of intercurrent events that can affect interpretation of the measurements include discontinuation of assigned treatment and use of an additional or alternative therapy. Use of an additional or alternative therapy can take multiple forms, including change to background or concomitant therapy and switching between treatments of interest. Examples of intercurrent events that would affect the existence of the measurements include terminal events such as death and leg amputation (when assessing symptoms of diabetic foot ulcers), when these events are not part of the variable itself. Certain clinical events can also be intercurrent events, when their occurrence, or non-occurrence, defines a principal stratum of interest (see A.3.2.). Examples include tumour shrinkage defining objective response when assessing a treatment effect on duration of response in oncology and occurrence of infection when assessing a treatment effect on severity of infections occurring after vaccination of initially uninfected subjects.

An intercurrent event might be identified solely by the event itself, such as discontinuation of treatment, or might be more granular. For example, the reason for the event might be specified, such as discontinuation of treatment due to toxicity, or due to lack of efficacy; the event might require to be of certain magnitude or degree, such as use of additional medication exceeding a specified duration or dose; or the timing of the event might be specified, perhaps in relation to its proximity to the assessment of the variable. Some events will affect interpretation of the outcome measurements indefinitely, such as discontinuation of treatment, whilst others will affect interpretation only temporarily, such as short-term use of additional treatment. Indeed, additional or alternative treatments can be diverse; either replacing or supplementing a treatment on which the subject is experiencing inadequate benefit, as an alternative where a subject is not tolerating their assigned treatment, or as a short-term acute treatment to manage a temporary flare in disease symptoms. In a clinical trial, additional or alternative treatments are often identified as e.g. background treatment, rescue medication, prohibited medication, distinguishing their different roles and allowing them to be considered separately. The additional granularity, identifying different intercurrent events, is required if different strategies are to be used. If the intercurrent event for which a strategy needs to be selected depends not only on, for example, failure to continue with treatment, but also on the reason, magnitude or timing associated with that failure, this additional information should be defined and recorded accurately in the clinical trial. The description of intercurrent events might in theory reflect very specific details of treatment and follow-up, such as a single missed dose of a chronic treatment or a dose taken at the wrong time of day. Where such specific criteria are not expected to affect interpretation of the variable, they would not need to be addressed as intercurrent events.

As indicated above, consideration of intercurrent events is required when constructing the estimand. Because the estimand is to be defined in advance of trial design, neither study withdrawal nor other reasons for missing data (e.g. administrative censoring in trials with survival outcomes) are in themselves intercurrent events. Subjects who withdraw from the trial may have experienced an intercurrent event before withdrawal.

A.3.2. Strategies for Addressing Intercurrent Events when Defining the Clinical Question of Interest

Descriptions of various strategies are listed below, each reflecting a different clinical question of interest in respect of a particular intercurrent event. Whether or not the naming convention is used, it is required that the choices of strategy are unambiguously clear once the estimand is constructed. It is not necessary to use the same strategy to address all intercurrent events. Indeed, different strategies will often be used to reflect the clinical question of interest in respect of different intercurrent events. Section A.3.4. gives some considerations on selecting strategies to construct an estimand.

Treatment policy strategy

The occurrence of the intercurrent event is considered irrelevant in defining the treatment effect of interest: the value for the variable of interest is used regardless of whether or not the intercurrent event occurs. For example, when specifying how to address use of additional medication as an intercurrent event, the values of the variable of interest are used whether or not the patient takes additional medication.

If applied in relation to whether or not a patient continues treatment, and whether or not a patient experiences changes in other treatments (e.g. background or concomitant treatments), the intercurrent event is considered to be part of the treatments being compared. In that case, this reflects the comparison described in the ICH E9 Glossary (under ITT Principle) as the effect of a treatment policy.

In general, the treatment policy strategy cannot be implemented for intercurrent events that are terminal events, since values for the variable after the intercurrent event do not exist. For example, an estimand based on this strategy cannot be constructed with respect to a variable that cannot be measured due to death.

Hypothetical strategies

A scenario is envisaged in which the intercurrent event would not occur: the value of the variable to reflect the clinical question of interest is the value which the variable would have taken in the hypothetical scenario defined.

A wide variety of hypothetical scenarios can be envisaged, but some scenarios are likely to be of more clinical or regulatory interest than others. For example, it may be of clinical or regulatory importance to consider the effect of a treatment under different conditions from those of the trial that can be carried out. Specifically, when additional medication must be made available for ethical reasons, a treatment effect of interest might concern the outcomes if the additional medication was not available. A very different hypothetical scenario might postulate that intercurrent events would not occur, or that different intercurrent events would occur. For example, for a subject that will suffer an adverse event and discontinue treatment, it might be considered whether the same subject would not have the adverse event or could continue treatment in spite of the adverse event. The clinical and regulatory interest of such hypotheticals is limited and would usually depend on a clear understanding of why and how the intercurrent event or its consequences would be expected to be different in clinical practice than in the clinical trial.

If a hypothetical strategy is proposed, it should be made clear what hypothetical scenario is envisaged. For example, wording such as “if the patient does not take additional medication” might lead to confusion as to whether the patient hypothetically does not take additional medication because it is not available or because the particular patient is supposed not to require it.

Composite variable strategies

This relates to the variable of interest (see A.3.3.). An intercurrent event is considered in itself to be informative about the patient’s outcome and is therefore incorporated into the definition of the variable. For example, a patient who discontinues treatment because of toxicity may be considered not to have been successfully treated. If the outcome variable was already success or failure, discontinuation of treatment for toxicity would simply be considered another mode of failure. Composite variable strategies do not need to be limited to dichotomous outcomes, however. For example, in a trial measuring physical functioning, a variable might be constructed using outcomes on a continuous scale, with subjects who die being attributed a value reflecting the lack of ability to function. Composite variable strategies can be viewed as implementing the intention-to-treat principle in some cases where the original measurement of the variable might not exist or might not be meaningful, but where the intercurrent event itself meaningfully describes the patient’s outcome, such as when the patient dies.

Terminal events, such as death, are perhaps the most salient examples of the need for the composite strategy. If a treatment saves lives, its effect on various measures in surviving patients may be of interest, but it would be inappropriate to say that the summary measure of interest was only the average value of some numerical measure in survivors. The outcome of interest is survival along with the numerical measures. For example, progression-free survival in oncology trials measures the treatment effect on a combination of the growth of the tumour and survival.

While on treatment strategies

For this strategy, response to treatment prior to the occurrence of the intercurrent event is of interest. Terminology for this strategy will depend on the intercurrent event of interest; e.g. “while alive”, when considering death as an intercurrent event.

If a variable is measured repeatedly, its values up to the time of the intercurrent event may be considered relevant for the clinical question, rather than the value at the same fixed timepoint for all subjects. The same applies to the occurrence of a binary outcome of interest up to the time of the intercurrent event. For example, subjects with a terminal illness may discontinue a purely symptomatic treatment because they die, yet the success of the treatment can be measured based on the effect on symptoms before death. Alternatively, subjects might discontinue treatment and, in some circumstances, it will be of interest to assess the risk of an adverse drug reaction while the patient is exposed to treatment.

Like the composite variable strategy, the while on treatment strategy can hence be thought of as impacting the definition of the variable, in this case by restricting the observation time of interest to the time before the intercurrent event. Particular care is required if the occurrence of the intercurrent event differs between the treatments being compared (see A.3.3.).

Principal stratum strategies

This relates to the population of interest (see A.3.3.). The target population might be taken to be the “principal stratum” (see Glossary) in which an intercurrent event would occur. Alternatively, the target population might be taken to be the principal stratum in which an intercurrent event would not occur. The clinical question of interest relates to the treatment effect only within the principal stratum. For example, it might be desired to know a treatment effect on severity of infections in the principal stratum of patients becoming infected after vaccination. Alternatively, a toxicity might prevent some patients from continuing the test treatment, but it would be desired to know the treatment effect among patients who are able to tolerate the test treatment.

It is important to distinguish “principal stratification” (see Glossary), which is based on potential intercurrent events (for example, subjects who would discontinue therapy if assigned to the test product), from subsetting based on actual intercurrent events (subjects who discontinue therapy on their assigned treatment). The subset of subjects who experience an intercurrent event on the test treatment will often be a different subset from those who experience the same intercurrent event on control. Treatment effects defined by comparing outcomes in these subsets confound the effects of the different treatments with the differences in outcomes possibly due to the differing characteristics of the subjects.

A.3.3. Estimand Attributes

The attributes below are used to construct the estimand, defining the treatment effect of interest.

The treatment condition of interest and, as appropriate, the alternative treatment condition to which comparison will be made (referred to as “treatment” through the remainder of this document). These might be individual interventions, combinations of interventions administered concurrently, e.g. as add-on to standard of care, or might consist of an overall regimen involving a complex sequence of interventions. (see Treatment Policy and Hypothetical strategies under A.3.2.).

The population of patients targeted by the clinical question. This will be represented by the entire trial population, a subgroup defined by a particular characteristic measured at baseline, or a principal stratum defined by the occurrence (or non-occurrence, depending on context) of a specific intercurrent event (see Principal Stratum strategies under A.3.2.).

The variable (or endpoint) to be obtained for each patient that is required to address the clinical question. The specification of the variable might include whether the patient experiences an intercurrent event (see Composite Variable and While on Treatment strategies under A.3.2.).

Precise specifications of treatment, population and variable are likely to address many of the intercurrent events considered in sponsor and regulator discussions of the clinical question of interest. The clinical question of interest in respect of any other intercurrent events will usually be reflected using the strategies introduced as treatment policy, hypothetical or while on treatment.

Finally, a population-level summary for the variable should be specified, providing a basis for comparison between treatment conditions.

When defining a treatment effect of interest, it is important to ensure that the definition identifies an effect due to treatment and not due to potential confounders such as differences in duration of observation or patient characteristics.

A.3.4. Considerations for Constructing an Estimand

The clinical questions of interest and associated estimands should be specified at the initial stages of planning any clinical trial. Precise specification of objectives for most trials will need to reflect discontinuation of treatment and use of additional or alternative treatments. In some settings terminal events, such as death, should be addressed. Some trial objectives can only be described with reference to clinical events, for example the duration of response in subjects who achieve a response.

The construction of an estimand should consider what is of clinical relevance for the particular treatment in the particular therapeutic setting. Considerations include the disease under study, the clinical context (e.g. the availability of alternative treatments), the administration of treatment (e.g. one-off dosing, short-term treatment or chronic dosing) and the goal of treatment (e.g. prevention, disease modification, symptom control). Also important is whether an estimate of the treatment effect can be derived that is reliable for decision making. For example, a clinical question on the treatment effect on clinical outcome regardless of which other therapies are to be used before that outcome is experienced differs to a clinical question on the treatment effect had no additional medication been available. Depending on the setting, either might represent a clinical question of interest. However, in both cases, a clinical trial designed to estimate these treatment effects will often include the possibility to use additional medications if medically required. For the former question, values after the use of additional treatment will be relevant. For the latter question, values after the additional treatment are not directly relevant since the values also reflect the impact of that additional medication. It should be agreed that reliable estimation is possible before the choice of estimand is finalised. This includes, for the latter question, the methods to replace observations that are not to be used in the analysis.

When constructing the estimand it is necessary to have a clear understanding of the treatment to which the clinical question of interest pertains (see A.3.3.). Clear specifications for the treatments of interest might already reflect multiple relevant intercurrent events. Specifically, a treatment might already reflect the clinical question of interest in respect of changes in background treatment, concomitant medications, use of additional or later-line therapies, treatment-switching and conditioning regimens. For example, it is possible to specify treatment as intervention A added to background therapy B, dosed as required. In that case, changes to the dose of background therapy B would not need to be considered as an intercurrent event. However, the use of an additional therapy would need to be considered as an intercurrent event. If use of any additional medication is also reflected, using the treatment policy strategy for example, then treatment might be specified as intervention A added to background therapy B, dosed as required, and with additional medication, as required. Alternatively, if the treatment is specified as intervention A, then both changes in background therapy and use of additional therapy would be addressed as intercurrent events.

Discussions should also consider whether specifications for the population and variable attributes should be used to reflect the clinical question of interest in respect of any intercurrent events. Strategies can then be considered for any other intercurrent events. Usually an iterative process will be necessary to reach an estimand that is of clinical relevance for decision making, and for which a reliable estimate can be made. Some estimands, in particular those for which the measurements taken are relevant to the clinical question, can often be robustly estimated making few assumptions. Other estimands may require methods of analysis with more specific assumptions that may be more difficult to justify and that may be more sensitive to plausible changes in those assumptions (see A.5.1.). Where significant issues exist to develop an appropriate trial design or to derive an adequately reliable estimate for a particular estimand, an alternative estimand, trial design and method of analysis would need to be considered.

Avoiding or over-simplifying the process of discussing and constructing an estimand risks misalignment between trial objectives, trial design, data collection and method of analysis. Whilst an inability to derive a reliable estimate might preclude certain choices of strategy, it is important to proceed sequentially from the trial objective and an understanding of the clinical question of interest, and not for the choice of data collection and method of analysis to determine the estimand.

The experimental situation should also be considered. If the management of subjects (e.g. dose adjustment for intolerance, rescue treatment for inadequate response, burden of clinical trial assessments) under a clinical trial protocol is justified to be different to that which is anticipated in clinical practice, this might be reflected in the construction of the estimand.

Once constructed, the estimand should define a target of estimation clearly and unambiguously. Consider an intercurrent event of discontinuation of treatment; it is of utmost importance to distinguish between treatment effects of interest based on the principal stratum of patients who would be able to continue if administered the test treatment and the effect during continued treatment. Furthermore, neither of these should be taken to represent an effect if all patients can continue with treatment.

As stated above, when using the hypothetical strategy, some conditions are likely to be more acceptable for regulatory decision making than others. The hypothetical conditions described should therefore be justified for the quantification of an interpretable treatment effect that is relevant to inform the decisions to be taken by regulators, and use of the medicine in clinical practice. The question of what the values for the variable of interest would have been if rescue medication had not been available may be an important one. In contrast, the question of what the values for the variable of interest would have been under the hypothetical condition that subjects who discontinued treatment because of adverse drug reaction had in fact continued with treatment, might not be justifiable as being of clinical or regulatory interest. A clinical question of interest based on the effect if all subjects had been able to continue with treatment is not well-defined without a thorough discussion of the hypothetical conditions under which it is supposed that they would have continued. The inability to tolerate a treatment may constitute, in itself, evidence of an inability to achieve a favourable outcome.

Characterising beneficial effects using estimands based on the treatment policy strategy might also be more generally acceptable to support regulatory decision making, specifically in settings where estimands based on alternative strategies might be considered of greater clinical interest, but main and sensitivity estimators cannot be identified that are agreed to support a reliable estimate or robust inference. An estimand based on the treatment policy strategy might offer the possibility to obtain a reliable estimate of a treatment effect that is still relevant. In this situation, it is recommended to also include those estimands that are considered to be of greater clinical relevance and to present the resulting estimates along with a discussion of the limitations, in terms of trial design or statistical analysis, for that specific approach. When constructing estimands based on the treatment policy strategy, inference can be complemented by defining an additional estimand and analysis pertaining to each intercurrent event for which the strategy is used; for example, contrasting both the treatment effect on a symptom score and the proportion of subjects using additional medication under each treatment. Similarly, an estimand using a while on treatment strategy should usually be accompanied by the additional information on the time to intercurrent event distributions, and an estimand based on a principal stratum would usefully be accompanied by information on the proportion of patients in that stratum, if available.

The considerations informing the construction of estimand to support regulatory decision making based on a non-inferiority or equivalence objective may differ to those for the choice of estimand for a superiority objective. As explained in ICH E9, the problem facing the regulator in their decision making is different when based on non-inferiority or equivalence studies compared to superiority studies. In Section 3.3.2. it is stated that such trials are not conservative in nature and the importance of minimising the number of protocol violations and deviations, non-adherence and study withdrawals is indicated. In Section 5.2.1. it is described that the result of the Full Analysis Set (FAS) is generally not conservative and that its role in such trials should be considered very seriously. Estimands that are constructed with one or more intercurrent events accounted for using the treatment policy strategy present similar issues for non-inferiority and equivalence trials as those related to analysis of the FAS under the ITT principle. Responses in both treatment groups can appear more similar following discontinuation of randomised treatment or use of another medication for reasons that are unrelated to the similarity of the initially randomised treatments. Estimands could be constructed to directly address those intercurrent events which can lead to the attenuation of differences between treatment arms (e.g. discontinuations from treatment and use of additional medications). When selecting strategies, it might be important to distinguish between trials designed to detect whether differences exist between treatments containing the same or similar active substance (e.g. comparison of a biosimilar to a reference treatment) and trials where a non-inferiority or equivalence hypothesis is used in order to establish and quantify evidence of efficacy. An estimand can be constructed to target a treatment effect that prioritises sensitivity to detect differences between treatments, if appropriate for regulatory decision making.

A.4. IMPACT ON TRIAL DESIGN AND CONDUCT

The design of a trial needs to be aligned to the estimands that reflect the trial objectives. A trial design that is suitable for one estimand might not be suitable for other estimands of potential importance. Clear definitions for the estimands on which quantification of treatments effects will be based should inform the choices that are made in relation to trial design. This includes determining the inclusion and exclusion criteria that identify the target population, the treatments, including the medications that are allowed and those that are prohibited in the protocol, and other aspects of patient management and data collection. If interest lies, for example, in understanding the treatment effect regardless of whether a particular intercurrent event occurs, a trial in which the variable is collected for all subjects is appropriate. Alternatively, if the estimands that are required to support regulatory decision making do not require the collection of the variable after an intercurrent event, then the benefits of collecting such data for other estimands should be weighed against any complications and potential drawbacks of the collection.

Efforts should be made to collect all data that are relevant to support estimation, including data that inform the characterisation, occurrence and timing of intercurrent events. Data cannot always be collected. Certainly, subjects cannot be retained in a trial against their will, and in some trials missing data for some subjects is inevitable by design, such as administrative censoring in trials with survival outcomes. On the contrary, the occurrence of intercurrent events such as discontinuation of treatment, treatment switching, or use of additional medication, does not imply that the variable cannot be measured thereafter, though the measures may not be relevant. For terminal events such as death, the variable cannot be measured after the intercurrent event, but neither should these data generally be regarded as missing.

Not collecting any data needed to assess an estimand results in a missing data problem for subsequent statistical inference. The validity of statistical analyses may rest upon untestable assumptions and, depending on the proportion of missing data, this may undermine the robustness of the results (see A.5.). A prospective plan to collect informative reasons for why data intended for collection are missing may help to distinguish the occurrence of intercurrent events from missing data. This in turn may improve the analysis and may also lead to a more appropriate choice of sensitivity analysis. For example, “loss to follow-up” may more accurately be recorded as “treatment discontinuation due to lack of efficacy”. Where that has been defined as an intercurrent event, this can be reflected through the strategy chosen to account for that intercurrent event and not as a missing data problem. To reduce missing data, measures can be implemented to retain subjects in the trial. However, measures to reduce or avoid intercurrent events that would normally occur in clinical practice risk reducing the external validity of the trial. For example, selection of the trial population or use of titration schemes or concomitant medications to mitigate the impact of toxicity might not be suitable if those same measures would not be implemented in clinical practice.

Randomisation and blinding remain cornerstones of controlled clinical trials. Design techniques for avoiding bias are addressed in Section 2.3. Certain estimands may necessitate, or may benefit from, use of trial designs such as run-in or enrichment designs, randomised withdrawal designs, or titration designs. It might be of interest to identify the principal stratum of subjects who can tolerate a treatment using a run-in period, in advance of randomising those subjects between test treatment and control. Dialogue between regulator and sponsor would need to consider whether the proposed run-in period is appropriate to identify the target population, and whether the choices made for the subsequent trial design (e.g. washout period, randomisation) supports the estimation of the target treatment effect and associated inference. These considerations might limit the use of these trial designs, and use of that particular strategy.

A precise description of the treatment effects of interest should inform sample size calculations. Particular care should be taken when making reference to historical studies that might, implicitly or explicitly, have reported estimated treatment effects or variability based on a different estimand. Where all subjects contribute information to the analysis, and where the impact of the strategy to reflect intercurrent events is included in the effect size that is targeted and the expected variance, it is not usually necessary to additionally inflate the calculated sample size by the expected proportion of subject withdrawals from the trial.

Section 7.2. addresses issues related to summarising data across clinical trials. The need to have consistent definitions for the variables of interest is highlighted and this can be extended to the construction of estimands. Hence, in situations when synthesising evidence from across a clinical trial programme is envisaged at the planning stage, a suitable estimand should be constructed, included in the trial protocols, and reflected in the choices made for the design of the contributing trials. Similar considerations apply to the design of a meta-analysis, using estimated effect sizes from completed trials to determine non-inferiority margins, or the use of external control groups for the interpretation of single-arm trials. A naïve comparison between data sources, or integration of data from multiple trials without consideration and specification of the estimand that is addressed in each data presentation or statistical analysis, could be misleading.

More generally, a trial is likely to have multiple objectives translated into multiple estimands, each associated with statistical testing and estimation. The multiplicity issues arising should be addressed.

A.5. IMPACT ON TRIAL ANALYSIS

A.5.1. Main Estimation

An estimand for the effect of treatment relative to a control will be estimated by comparing the outcomes in a group of subjects on the treatment to those in a similar group of subjects on the control. For a given estimand, an aligned method of analysis, or estimator, should be implemented that is able to provide an estimate on which reliable interpretation can be based. The method of analysis will also support calculation of confidence intervals and tests for statistical significance. An important consideration for whether an interpretable estimate will be available is the extent of assumptions that need to be made in the analysis. Key assumptions should be stated explicitly together with the estimand and accompanying main and sensitivity estimators. Assumptions should be justifiable and implausible assumptions should be avoided. The robustness of the results to potential departures from the underlying assumptions should be assessed through an estimand-aligned sensitivity analysis (see A.5.2.). Estimation that relies on many or strong assumptions requires more extensive sensitivity analysis. Where the impact of deviations from assumptions cannot be comprehensively investigated through sensitivity analysis, that particular combination of estimand and method of analysis might not be acceptable for decision making.

All methods of analysis rely on assumptions, and different methods may rely on different assumptions even when aligned to the same estimand. Nevertheless, some kinds of assumption are inherent in all methods of analysis aligned to estimands that use each of the different strategies outlined; for example, the methodology for predicting the outcomes that would have been observed in the hypothetical scenario, or for identifying a suitable target population in a principal stratum strategy. Some examples are given below related to the different strategies used to reflect the occurrence of intercurrent events. The issues highlighted will be key components of discussion between sponsor and regulator in advance of an estimand, main analysis and sensitivity analysis being agreed.

Analysis aligned with a treatment policy strategy to address a given intercurrent event may entail stronger or weaker assumptions depending on the design and conduct of the trial. When most subjects are followed-up even after the respective intercurrent event (e.g. discontinuation of treatment), the remaining problem of missing data may be relatively minor. In contrast, when observation is terminated after an intercurrent event, which is obviously undesirable in respect of this strategy, the assumption that (unobserved) outcomes for discontinuing subjects are similar to the (observed) outcomes for those who remain on treatment will often be implausible. An alternative approach to handle the missing data would need to be justified and sensitivity analysis will be expected.

Analysis aligned to a hypothetical strategy involves outcomes different from those actually observed; for example, outcomes if rescue medication had not been given when in fact it was. Observations before the rescue medication and observations on subjects who did not require rescue medication may be informative, but only under strong assumptions.

A composite variable strategy can avoid statistical assumptions about data after an intercurrent event by considering occurrence of the intercurrent event as a component of the outcome. The potential concern relates less to assumptions for estimation, and more to the interpretation of the estimated treatment effect. For the estimand to be interpretable, if scores are assigned for failure because the intercurrent event occurs, these should meaningfully reflect the lack of benefit to the patient (e.g. death may be reflected differently than discontinuation of treatment due to adverse event).

Estimands constructed based on a while on treatment strategy can be estimated provided outcomes are collected up to the time of the intercurrent event. Again, the crucial assumptions concern interpretation. Take discontinuation of treatment by way of example. Outcomes while on treatment may be improved but the treatment may also shorten, or lengthen, the treatment period by provoking, or delaying, discontinuations, and both these effects should be considered in interpretation and assessment of clinical benefit.

Analysis aligned to a principal stratum strategy usually requires strong assumptions. For example, some principal stratification methods infer this from baseline characteristics of the subjects, but the correctness of this inference may be difficult to assess. This difficulty cannot be avoided by simplified methods, however. For example, simply comparing subjects who do not have an intercurrent event on the test treatment to those who do not have an event on control, assuming intercurrent events are unrelated to treatment, is very difficult to justify.

Even after defining estimands that address intercurrent events in an appropriate manner and making efforts to collect the data required for estimation (see A.4.), some data may still be missing, including e.g. administrative censoring in trials with survival outcomes. Failure to collect relevant data should not be confused with the choice not to collect, or to collect and not to use, data made irrelevant by an intercurrent event. For example, data that were intended to be collected after discontinuation of trial medication to inform an estimand based on the treatment policy strategy are missing if uncollected; however, the same data points might be irrelevant for another strategy, and thus, for the purpose of that second estimand, are not missing if uncollected. Where those efforts to collect data are not successful it becomes necessary to make assumptions to handle the missing data in the statistical analysis. Handling of missing data should be based on clinically plausible assumptions and, where possible, guided by the strategies employed in the description of the estimand. The approach taken may be based on observed covariates and post-baseline data from individual subjects and from other similar subjects. Criteria to identify similar subjects might include whether or not the intercurrent event has occurred. For example, for subjects who discontinue treatment without further data being collected, a model may use data from other subjects who discontinued treatment but for whom data collection has continued.

A.5.2. Sensitivity Analysis

A.5.2.1. Role of Sensitivity Analysis

Inferences based on a particular estimand should be robust to limitations in the data and deviations from the assumptions used in the statistical model for the main estimator. This robustness is evaluated through a sensitivity analysis. Sensitivity analysis should be planned for the main estimators of all estimands that will be important for regulatory decision making and labelling in the product information. This can be a topic for discussion and agreement between sponsor and regulator.

The statistical assumptions that underpin the main estimator should be documented. One or more analyses, focused on the same estimand, should then be pre-specified to investigate these assumptions with the objective of verifying whether or not the estimate derived from the main estimator is robust to departures from its assumptions. This might be characterised as the extent of departures from assumptions that change the interpretation of the results in terms of their statistical or clinical significance (e.g. tipping point analysis).

Distinct from sensitivity analysis, where investigations are conducted with the intent of exploring robustness of departures from assumptions, other analyses that are conducted in order to more fully investigate and understand the trial data can be termed “supplementary analysis” (see Glossary; A.5.3.). Where the primary estimand(s) of interest is agreed between sponsor and regulator, the main estimator is pre-specified unambiguously, and the sensitivity analysis verifies that the estimate derived is reliable for interpretation, supplementary analyses should generally be given lower priority in assessment.

A.5.2.2. Choice of Sensitivity Analysis

When planning and conducting a sensitivity analysis, altering multiple aspects of the main analysis simultaneously can make it challenging to identify which assumptions, if any, are responsible for any potential differences seen. It is therefore desirable to adopt a structured approach, specifying the changes in assumptions that underlie the alternative analyses, rather than simply comparing the results of different analyses based on different sets of assumptions. The need for analyses varying multiple assumptions simultaneously should then be considered on a case by case basis. A distinction between testable and untestable assumptions may be useful when assessing the interpretation and relevance of different analyses.

The need for sensitivity analysis in respect of missing data is established and retains its importance in this framework. Missing data should be defined and considered in respect of a particular estimand (see A.4.). The distinction between data that are missing in respect of a specific estimand and data that are not directly relevant to a specific estimand gives rise to separate sets of assumptions to be examined in sensitivity analysis.

A.5.3. Supplementary Analysis

Interpretation of trial results should focus on the main estimator for each agreed estimand providing that the corresponding estimate is verified to be robust through the sensitivity analysis. Supplementary analyses for an estimand can be conducted in addition to the main and sensitivity analysis to provide additional insights into the understanding of the treatment effect. They generally play a lesser role for interpretation of trial results. The need for, and utility of, supplementary analyses should be considered for each trial.

Section 5.2.3. indicates that it is usually appropriate to plan for analyses based on both the FAS and the Per Protocol Set (PPS) so that differences between them can be the subject of explicit discussion and interpretation. Consistent results from analyses based on the FAS and the PPS is indicated as increasing confidence in the trial results. It is also described in Section 5.2.2. that results based on a PPS might be subject to severe bias. In respect of the framework presented in this addendum, it may not be possible to construct a relevant estimand to which analysis of the PPS is aligned. As noted above, analysis of the PPS does not achieve the goal of estimating the effect in any principal stratum, for example, in those subjects able to tolerate and continue to take the test treatment, because it may not compare similar subjects on different treatments.

Protocol violations and deviations might exclude subjects from the PPS, for example by having a visit outside a time window, without an intercurrent event necessarily having occurred. Likewise, subjects could experience an intercurrent event, such as death, without having deviated from the protocol. Notwithstanding the differences between violations and deviations from the protocol and intercurrent events, events likely to affect the interpretation or existence of measurements are considered in the description of the estimand. Estimands might be constructed, with aligned method of analysis, that better address the objective usually associated with the analysis of the PPS. If so, analysis of the PPS might not add additional insights.

A.6. DOCUMENTING ESTIMANDS AND SENSITIVITY ANALYSIS

A trial protocol should define and specify explicitly a primary estimand that corresponds to the primary trial objective. The protocol and the analysis plan should pre-specify the main estimator that is aligned with the primary estimand and leads to the primary analysis, together with a suitable sensitivity analysis to explore the robustness under deviations from its assumptions. Estimands for secondary trial objectives (e.g. related to secondary variables) that are likely to support regulatory decisions should also be defined and specified explicitly, each with a corresponding main estimator and a suitable sensitivity analysis. Additional exploratory trial objectives may be considered for exploratory purposes, leading to additional estimands.

The choice of the primary estimand will usually be the main determinant for aspects of trial design, conduct and analysis. Following usual practices, these aspects should be well documented in the trial protocol. If secondary estimands are of key interest, these considerations may be extended to support these as needed and should be documented as well. Beyond these aspects, the conventional considerations for trial design, conduct and analysis remain the same.

While it is to the benefit of the sponsor to have clarity on what is being estimated, it is not a regulatory requirement to document an estimand for each exploratory objective.

Results from the main, sensitivity and supplementary analyses should be reported systematically in the clinical trial report, specifying whether each analysis was pre-specified, introduced while the trial was still blinded, or performed post hoc. Summaries of the number and timings of each intercurrent event in each treatment group should be reported.

Changes to the estimand during the trial can be problematic and can reduce the credibility of the trial. Addressing intercurrent events that were not foreseen at the design stage, and are identified during the conduct of the trial, should discuss not only the choices made for the analysis, but the effect on the estimand, i.e. on the description of the treatment effect that is being estimated, and the interpretation of the trial results. A change to the estimand should usually be reflected through amendment to the protocol.

GLOSSARY

Term	Content
Estimand:	A precise description of the treatment effect reflecting the clinical question posed by the trial objective. It summarises at a population-level what the outcomes would be in the same patients under different treatment conditions being compared.
Estimate:	A numerical value computed by an estimator.
Estimator:	A method of analysis to compute an estimate of the estimand using clinical trial data.
Intercurrent Events:	Events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest. It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated.
Missing Data:	Data that would be meaningful for the analysis of a given estimand but were not collected. They should be distinguished from data that do not exist or data that are not considered meaningful because of an intercurrent event.
Principal Stratification:	Classification of subjects according to the potential occurrence of an intercurrent event on all treatments. With two treatments, there are four principal strata with respect to a given intercurrent event: subjects who would not experience the event on either treatment, subjects who would experience the event on treatment A but not B, subjects who would experience the event on treatment B but not A, and subjects who would experience the event on both treatments. In this document a principal stratum refers to any of the strata (or combination of strata) defined by principal stratification.
Sensitivity Analysis:	A series of analyses conducted with the intent to explore the robustness of inferences from the main estimator to deviations from its underlying modelling assumptions and limitations in the data.
Supplementary Analysis:	A general description for analyses that are conducted in addition to the main and sensitivity analysis with the intent to provide additional insights into the understanding of the treatment effect.

ICH E9

ICH E9(R1) 临床试验中的估计目标与敏感性分析（E9指导原则增补文件）

English Version

ICH E9(R1) Addendum: Statistical Principles for Clinical Trials

A.1. 目的和范围

为了给制药公司、监管机构、患者、医生和其他利益相关方的决策提供正确的信息，应明确描述特定医疗条件下治疗（药物）的获益和风险。如果不能对此进行明确描述，报告的“治疗效应”可能会被误解。本增补提出了一个结构化的框架，以加强参与制定临床试验目的、设计、实施、分析和解释的多学科间的交流，并加强申办方和监管机构之间关于临床试验中治疗效应的沟通。

构建相应临床问题的“估计目标”（见词汇表；A.3.）有助于精确描述治疗效应，这就需要深思熟虑地定义“伴发事件” （见词汇表；A.3.1.），如终止分配的治疗，使用额外或其他治疗，或终末事件（如死亡）等。估计目标的描述应该反映出与这些伴发事件相关的临床问题，并且本增补介绍了反映不同临床问题的策略。在描述临床问题时，策略的选择可能会影响到如何反映试验的更加常规的属性，例如治疗、人群或相关的变量（终点）。

临床试验数据的统计分析应当与估计目标对应。本增补阐明了“敏感性分析”（见词汇表）在探索主要统计分析结论稳健性中的作用。

本增补中，对原始ICH E9的引用采用x.y格式，对本增补的引用采用A.x.y.格式。

本增补就以下若干方面澄清和扩展了ICH E9。第一，ICH E9介绍了随机对照试验中对应于疗法策略的意向治疗（ITT）原则，据此对受试者进行随访、评估和分析，而不考虑其是否依从计划的治疗过程，这表明保持随机化为统计学检验提供了一个坚实的基础。ITT 原则具有以下三个含义。首先，试验分析应包括与研究问题相关的所有受试者。其次，受试者应按随机化时的分配纳入分析。最后，根据ITT原则（见ICH E9词汇表）的定义，无论是否依从预定的治疗过程，都应对受试者进行随访和评估，并在分析中使用这些评估。毫无疑问，随机化是对照临床试验的基石，分析时应最大限度地利用随机化的这一优势。然而，根据ITT 原则估计治疗效应能否总是代表与监管和临床决策最相关的治疗效应，这个问题仍然悬而未决。本增补中概述的框架为描述不同的治疗效应提供了基础，并提出了试验设计和分析需考虑的要点，以便估计治疗效应，为决策提供可靠依据。

第二，本增补重新审视了通常归为数据处理和“缺失数据”（见词汇表）的一些问题，并提出了两个重要的区别。首先，增补对终止随机分配的治疗和退出研究加以区分。前者代表一个伴发事件，需通过在试验目的中对估计目标的精确说明加以解决；后者导致缺失数据，需在统计分析中加以解决。例如，考虑在肿瘤学试验中转组治疗的受试者，以及由于试验完成而无法观测到结局事件的受试者。前者代表伴发事件，关于该事件的临床问题应明确。后者属于管理性删失，需要在统计分析中作为缺失数据问题加以解决。估计目标的清晰性为计划需要收集哪些数据提供了依据，以及哪些数据如果未被收集到即为缺失数据问题，需要在统计分析中加以解决。然后，可以选择解决缺失数据问题的方法，以与估计目标一致。其次，增补强调了不同伴发事件的不同影响。诸如终止治疗、转组治疗或使用额外药物等事件可能导致变量的后续观测值即使可以收集到数据也与估计目标不相关或难以解释。而对于死亡的受试者，死亡后的观测值是不存在的。

第三，在框架中考虑了与分析集概念相关的问题。第 5.2.节强烈建议优效性试验的分析基于全分析集，即尽可能包括所有随机化受试者的分析集。然而，试验往往包括对同一受试者的重复观测。某些受试者按计划收集的观测值可能被认为是无关的或难以解释的，剔除这些观测值，与从全分析集中完全剔除受试者可能具有类似的后果，即没有完全保留最初的随机化。这样做的一个后果是，随机化赋予关于治疗效应的检验假设的理论优势获益以及平衡基线混杂因素的实际获益可能被削弱。另外，有意义的结局变量取值可能不存在，例如当受试者已死亡。第 5.2.节没有直接阐明这些问题。这些问题要在考虑伴发事件的前提下，通过仔细定义关注的治疗效应来进行明确，既要确定要包括在治疗效应估计中的受试者人群，又要确定每个受试者包括在分析中的观测值。本增补也重新审视了使用符合方案集来分析的意义和作用，尤其是，是否需要用比分析符合方案集更能减少偏倚、更有可解读性的方式，来研究方案违背和偏离的影响。

最后，在敏感性分析部分进一步讨论了稳健性的概念（见1.2.）。特别区分了所选分析方法的假设的敏感性，以及分析方法选择上的敏感性。通过精确说明已达成共识的估计目标，以及与估计目标一致的分析方法且其预先设定的细节描述达到能使第三方精确地重现分析结果的程度，这样，监管机构对于一个特定分析可聚焦于假设偏离和数据局限的敏感性。

无论是基于有效性或安全性的治疗效应估计，还是对治疗效应相关假设的检验，本增补中概述的原则均适用。虽然主要关注的是随机临床试验，但这些原则也同样适用于单臂试验和观察性研究。该框架适用于任何数据类型，包括纵向数据、首次事件发作时间数据和复发事件数据。对于确证性临床试验和用于产生确证性结论的跨试验整合数据，监管部门对所述原则的应用将更为关注。

A.2. 将计划、设计、实施、分析和解释协调一致的框架

试验计划应按顺序进行（图 1）。应通过定义合适的估计目标，将明确的试验目的转化为关键的临床所关注的问题。估计目标根据特定的试验目的定义估计的目标（即“要估计什么”，见A.3.），然后可以选择合适的估计方法（即分析方法，称为主“估计方法”，见词汇表）（见 A.5.1.）。主估计方法将以特定假设为基础，为了探索根据主估计方法所作推断对偏离其基本假设的稳健性，应针对同一估计目标采用一种或多种形式进行敏感性分析（见A.5.2.）。

图1：协调估计的目标、估计的方法和敏感性分析，使其与给定试验目的对应

该框架有助于制定适当的试验计划，以明确区分估计的目标（试验目的，估计目标）、估计的方法（估计方法）、数值结果（“估计值”，见词汇表）和敏感性分析。这将有助于申办方的试验计划制定和监管机构的审评工作，并在双方讨论临床试验设计的适宜性和临床试验结果的解释时增强交流。

指定适当的估计目标（见 A.3.）通常是试验设计、实施（见A.4.）和分析（见A.5.）方面的主要决定因素。

A.3. 估计目标

药物开发和批准的核心问题是明确治疗效应是否存在，并估计其大小：如何比较相同受试者接受不同治疗的结局（即，如果受试者未接受治疗或接受不同治疗）。估计目标是对治疗效应的精确描述，反映了既定临床试验目的提出的临床问题。它在群体层面上总结了同一批患者在不同治疗条件下比较的结果。估计的目标将在临床试验之前定义。一旦定义了估计的目标，即可设计试验以可靠地估计治疗效应。

估计目标的描述涉及特定属性的精确说明，这些属性不仅应基于临床考虑而制定，还应基于所关注的临床问题中如何反映伴发事件。第 A.3.1.节介绍了伴发事件。第 A.3.2.节介绍了各种策略，来描述与伴发事件有关的问题。第 A.3.3.节描述了估计目标的属性，第 A.3.4.节则提出了估计目标构建的考虑要点。理解不同策略之间的差异，并精确阐明哪些策略用于构建估计目标，这一点至关重要。

A.3.1. 临床问题中反映的伴发事件

伴发事件是指治疗开始后发生的事件，可影响与临床问题相关的观测结果的解读或存在。在描述临床问题时，有必要阐明伴发事件，以便准确定义需要估计的治疗效应。

在描述治疗效应时需要考虑伴发事件，因为变量的观测结果可能受伴发事件的影响，而伴发事件的发生可能取决于治疗。例如，两名患者可能最初暴露于相同的治疗并提供相同的结局观测值，但如果其中一名患者接受了其他药物治疗，则两名患者之间，观测值所反映的治疗的信息会有所不同。此外，患者接受的治疗会影响到他们是否需要服用其他用药，以及是否可以继续接受治疗。与缺失数据不同，伴发事件不应被认为是临床试验中需要避免的缺陷。在临床试验中发生的终止既定治疗、使用其他药物和其他此类事件在临床实践中也可能发生，因此在定义临床问题时需要明确考虑这些事件发生的可能。

可影响观测结果解释的伴发事件包括终止所分配的治疗，和使用额外或其他治疗。使用额外或其他疗法可以有多种形式，包括改变基础治疗或合并治疗、转组治疗。影响观测结果存在的伴发事件包括终末事件，例如死亡和腿截肢（当评估糖尿病性足溃疡的症状时），而且这些事件不是变量本身的一部分。当某些临床事件的发生或不发生定义了一个主层时（见 A.3.2.），这些事件也可以是伴发事件。例如，肿瘤领域中在评估缓解持续时间疗效时定义客观缓解的肿瘤缩小；对于初始未感染的接种疫苗受试者在评估感染严重程度疗效时的感染发生。

伴发事件可能仅由事件本身确定，如终止治疗，或可能有更详细的定义。详细的定义例如，可明确说明事件发生的原因，如因毒性作用终止治疗，或因缺乏疗效而终止治疗；事件可能需要达到一定量级或程度，如使用超过规定时间或剂量的其他药物；或明确说明事件发生的时机，可能与其对变量评估的接近程度有关。一些事件会无限期地影响结局观测值的解释，例如终止治疗，而另一些事件只会暂时影响，例如短期使用其他治疗。事实上，额外或其他治疗可以是多样的；可以是替代或补充受试者获益不足时的治疗，或作为对既定治疗不耐受的另一种选择，或作为控制疾病暂时急性发作的短期急性治疗。在临床试验中，额外或其他治疗通常是指诸如基础治疗、补救药物和禁用药物，要区分它们的不同作用以对其分别考虑。如果要使用不同的策略，则需要额外的详细信息，确定不同的伴发事件。例如，如果伴发事件不仅取决于未继续治疗，还取决于与未继续治疗相关的原因、程度或时机，则应在临床试验中准确定义和记录该附加信息。理论上，描述伴发事件可能体现治疗和随访非常具体的细节，例如长期治疗的单次漏服或日间服药的错误时间。如果预期这些具体标准不会影响对变量的解释，则不需要将它们作为伴发事件处理。

如上所述，在构建估计目标时需要考虑伴发事件。因为估计目标要在试验设计之前进行定义，所以无论是退出研究还是其他缺失数据的原因（例如生存结局的试验中的管理性删失）本身都不是伴发事件。退出试验的受试者在退出前可能已经发生了伴发事件。

A.3.2. 在定义临床问题时解决伴发事件的策略

下面列出了多种策略，每种策略又体现了对于特定伴发事件的不同临床问题。无论是否使用如下命名规则，构建估计目标时策略的选择都必须清晰明确。无需使用相同的策略来处理所有的伴发事件。事实上，通常会使用不同的策略来明确体现不同伴发事件的临床问题。第 A.3.4.节给出了一些在构建估计目标时策略选择上的考虑。

疗法策略

疗法策略下伴发事件的发生与定义治疗效应无关，即无论是否发生伴发事件，均会使用相关变量的值。例如，将使用其他药物治疗作为伴发事件时，规定无论患者是否服用其他药物，都使用相关变量的值。

对于患者是否继续治疗以及患者的其他治疗（基础或合并治疗）是否有变化等伴发事件，在疗法策略中被视为治疗的一部分。基于这种情况的比较就体现了ICH E9所阐述的ITT原则，比较结果亦是疗法策略下的治疗效应。

一般情况下，对于终末事件类型的伴发事件，不能采用疗法策略，原因是该类伴发事件后变量的值不再存在。例如，在死亡之后变量是无法观测的，因此不能基于此策略构建估计目标。

假想策略

假想策略设想一种没有发生伴发事件的情景：此时，体现临床问题的变量值是在所假设的情景下采用的变量值。

存在各种各样的假设情景，但其中有些情景更具临床或监管意义。例如，在与可实施试验条件不同的条件下的治疗效应可能具有临床或监管重要性。具体而言，当出于伦理原因必须提供额外药物治疗时，可能要考虑未提供额外药物情形下的治疗效应。一个非常不同的假设情景可能是假定伴发事件不会发生，或者会发生不同的伴发事件。例如，对于因发生不良事件而终止治疗的受试者，可考虑同一受试者没有发生不良事件或即使发生不良事件仍然继续治疗的情景。这种假设情景的临床和监管意义有限，并且通常需要清楚地理解伴发事件或其后果在临床实践与临床试验中为什么不同以及如何不同。

如果提出了一个假想策略，应该明确具体的假设情景是什么。举例来说，诸如“如果患者未服用额外药物”之类的措辞可能会导致混淆，因为不清楚患者是因为没有额外药物可用而未服用，还是该患者不需要服用额外药物而未服用。

复合变量策略

复合变量策略与关注的变量有关（见 A.3.3.）。伴发事件本身可提供关于患者结局的信息，因此将其纳入变量的定义之中。例如，由于毒性而终止治疗的患者可能被认为治疗失败。如果变量已被定义为成功或失败，因毒性终止治疗将被认为是另一种形式的失败。复合变量策略不仅限于二分类变量，也可以是连续型变量。例如，在观测生理功能的试验中，死亡的受试者可以用某一数值代表生理功能缺失。当变量原始观测值可能不存在或没有意义，但是伴发事件本身能够体现患者结局（如患者死亡）时，可将复合变量策略视为遵循意向治疗原则的策略。

终末事件，如死亡，可能是需要采用复合策略的最突出例子。如果某种治疗可以挽救生命，可能会关注其对存活患者的各种指标的作用，但是，如果汇总指标仅关注存活患者的一些数值指标的平均值是不够的，要同时关注数值指标和是否生存。例如，肿瘤试验中的无进展生存期衡量了肿瘤生长和生存组合在一起的治疗效应。

在治策略

在治策略关注在伴发事件发生之前的治疗效应。该策略的具体术语将取决于相关伴发事件；例如，当将死亡视为伴发事件时，可以称为“在世策略”。

如果一个变量被重复测量，则伴发事件发生前的所有观测值都可能被认为与临床问题相关，而不是所有受试者在相同固定时间点的值。这也适用于二分类结局在伴发事件之前发生的情况。例如，处于终末期的受试者可能会因为死亡而终止对症治疗，但可以根据死亡前症状的缓解情况评估治疗效果。还有一种情形，受试者可能终止治疗，此时评估其暴露于治疗期间药物不良反应的风险是值得关注的。

因此，在治策略与复合变量策略类似，会影响变量的定义。在这种情况下，在治策略通过将相应的观测时间限制在伴发事件之前来影响。如果各治疗组间的伴发事件的发生率不同，则尤其需要谨慎（见A.3.3.）。

主层策略

主层策略与人群有关（见 A.3.3.）。可认为目标人群是会发生伴发事件的“主层”（见词汇表）。或者，目标人群是不会发生伴发事件的主层。临床问题仅在该主层中与治疗效应相关。例如，在接种疫苗后仍然感染的患者主层中，可能需要了解针对感染严重程度的治疗效应。或者，毒性可能会使一些患者无法继续接受试验药物，但需要了解能够耐受试验药物的患者的治疗效应。

区分“主层”和子集很重要。“主层”（见词汇表）是基于潜在的伴发事件（例如，若分配到试验组将终止治疗的受试者），而“子集”是基于实际发生的伴发事件（终止既定治疗的受试者）。在试验组发生伴发事件的受试者子集通常与对照组发生相同伴发事件的受试者子集不同。比较这些子集的结局而定义的治疗效应，会混杂不同治疗间的真实效应和可能由于受试者不同特征导致的结局差异。

A.3.3. 估计目标的属性

下述属性用于构建估计目标，定义相关的治疗效应。

治疗（处理）：相关的治疗条件，以及适用时进行比较的其他治疗条件（在本文件其余部分中称为“治疗”）。这些可能是单独的干预措施，也可能是同时进行的干预措施的组合（例如作为加载治疗），或者是一个复杂干预序列组成的整体方案。（请见A.3.2.下的疗法策略和假想策略）。

人群：临床问题所针对的患者人群。可以是整个试验人群，也可以是按某种基线特征定义的亚组，或由特定伴发事件的发生（或不发生，视具体情况而定）定义的主层（参见 A.3.2.下的主层策略）。

变量（或终点）：为解决临床问题从每个患者获得的变量（或终点）。变量定义可能包括患者是否发生伴发事件（参见 A.3.2.下的复合变量策略和在治策略）。

其他伴发事件: 在申办方与监管机构关于相关临床问题的交流中，治疗、人群和变量的精确说明有助于解决一些伴发事件。针对任何其他伴发事件的临床相关问题，通常采用疗法策略、假想策略或在治策略来反映。

群体层面汇总：最后，应规定变量的群体层面的汇总统计量，为不同治疗之间的比较提供基础。

在定义治疗效应时，重要的是能够明确效应是由治疗引起的，而不是由潜在的混杂因素如观察期或患者特征的差异等引起的。

A.3.4. 构建估计目标的考量

临床问题及与之相关联的估计目标，应当在计划临床试验的初始阶段予以明确。大多数临床试验目的的精确说明，需要体现终止治疗、使用额外治疗或其他治疗的影响。在某些情况下，还应说明死亡一类的终末事件。有些试验目的只能参照临床事件来描述，例如获得应答的受试者的应答持续时间。

构建一个估计目标，应该考虑在特定医疗环境下特定治疗的临床相关性。需考虑的因素包括：所研究的疾病、临床情况（例如可供选择的其他治疗）、治疗方式（例如一次性给药、短期治疗或长期给药）和治疗目的（例如预防、疾病改善、症状控制）。同样重要的是，能否估计出可靠的治疗效应供决策之用。例如，在临床结局发生前，无论是否使用其他治疗情况下的治疗效应和假设没有额外药物可用情况下的治疗效应，是不同的，它们可能都是值得关注的临床问题。但是，在这两种情况下，为估计这些治疗效应，相应的临床试验设计通常会考虑到在医学上需要使用额外药物的可能性。对于前一问题，使用额外治疗后的观测值是有意义的。对于后一问题，额外治疗后的观测值则无直接相关性，因为这些数值也反映了额外治疗的影响。在估计目标最终确定之前，应确认能够得到可靠的估计，包括后一个问题中，用什么方法替代未被分析使用的观测值。

在构建估计目标时，有必要清楚地了解相关临床问题所涉及的治疗（见 A.3.3.）。对治疗的明确说明可能已经反映了多个相关的伴发事件。具体而言，治疗可能已经反映了临床关注问题所涉及的以下变化：基础治疗、合并用药、使用额外或后线治疗、转组治疗和预处理方案。例如，可以将治疗指定为干预 A加基础治疗B，并按需给药。这种情况下，无需将基础治疗B剂量的变化视为伴发事件。但是，需要将额外治疗视为伴发事件。如果治疗还涉及额外药物的使用，例如在使用疗法策略时，可将治疗定义为干预A加基础治疗B，按需给药，并按需使用额外药物。或者，如果治疗定义为干预 A，那么基础治疗的变化和额外治疗的使用都将视为伴发事件。

还应讨论是否通过明确人群和变量属性来说明相关临床问题的伴发事件。然后可以考虑有关任何其他伴发事件的策略。通常需要反复讨论来确定对于决策具有临床相关性，并且能得出可靠估计值的估计目标。一些估计目标，特别是那些观测值与临床问题相关的估计目标，通常可以通过很少的假设做出稳健的估计。而有些估计目标的分析方法可能需要更具体的假设，这些假设可能更难以论证，并且可能对假设的合理变化更敏感（见 A.5.1.）。对于某一特定的估计目标，如果在试验设计的合理性或者估计值的可靠性方面存在明显不足的话，就需要考虑另一种估计目标、试验设计和分析方法。

省略或过度简化讨论和构建估计目标的过程，会产生导致试验目的、试验设计、数据收集和分析方法之间不一致的风险。当无法得出可靠的估计值时，可能会妨碍某些策略的选择，重要的是要从试验目的和对临床相关问题的理解出发，而不是为了选择数据收集和分析方法，来确定估计目标。

还应考虑试验现状。如果根据临床试验方案对受试者的管理（例如，因不耐受而进行剂量调整，因应答不足而做的补救治疗，临床试验评估所带来的负担）被证实与临床实践中所预期的不同，这可能要在估计目标的构建中有所体现。

估计目标的构建，应该明确和清晰地定义估计的目标。以终止治疗伴发事件为例，最重要的是区分基于假设接受试验药物就能继续治疗的患者主层的治疗效应与实际持续治疗期间的效应。此外，如果所有患者都能继续治疗，则这两种情况都不能用来反映效应。

如上所述，当使用假想策略时，某些情形更可能为监管决策所接受。因此，所描述的假想情形应该能合理地用来量化可解释的治疗效应，为监管机构做决策和临床实践中药物的使用提供相关信息。假如未获得补救药物，变量值会是多少也许是一个重要问题。相反，如果假想情形中因药物不良反应而终止治疗的受试者实际上继续接受治疗，那么相关变量值在这一假想情形下会是多少这个问题可能不具有临床或监管意义。假如所有受试者都能够继续接受治疗，但没有对他们会继续这一假想情形进行充分讨论的话，则基于该效应的临床问题的定义是不充分的。药物不耐受本身可能就构成了无法达到有利结局的证据。

使用基于疗法策略的估计目标来描述获益效应以支持监管决策，可能更被普遍接受，特别是在某些情况下，尽管基于其他策略的估计目标可能被认为更具临床意义，但是无法找到其公认的能支持可靠估计值或稳健推断的主估计方法和敏感性估计方法。基于疗法策略的估计目标仍然可能得到具有临床相关性的可靠估计值。在这种情况下，建议还包括那些被认为具有更大临床相关性的估计目标，并给出所得到的估计值，以及关于该特定方法在试验设计或统计分析方面的局限性的讨论。在构建基于疗法策略的估计目标时，可以对使用该策略的每个伴发事件定义额外的估计目标和分析来补充推断；例如，对比各治疗组中治疗对症状评分的治疗效应和使用额外药物的受试者比例。类似的，使用在治策略的估计目标通常应该有伴发事件发生时间分布的附加信息，基于主层策略的估计目标通常有关于该主层中患者比例的信息（如果有）。

基于非劣效性或等效性目的构建估计目标以支持监管决策的考虑，可能与基于优效性目的的估计目标不同。正如在ICH E9中所解释的，非劣效性或等效性研究与优效性研究相比，监管机构在决策中面临的问题是不同的。在第 3.3.2.节中指出，这类试验本质上不保守，重要的是尽量减少方案违背和偏离、不依从和退出研究的数量。在第5.2.1.节中指出，全分析集（FAS）的结果一般不是保守的，它在这类试验中的作用应该被认真考虑。使用疗法策略来说明由一个或多个伴发事件构建的估计目标，对于非劣效性和等效性试验而言，存在的问题与ITT原则下使用FAS分析存在的相关问题类似。终止随机分配的治疗或因各种原因使用另一种药物，两个治疗组的应答情况可以表现得更相近，而这与最初的随机分配的治疗组间的相似度无关。可以构建估计目标来直接说明那些可能导致治疗组之间差异被弱化的伴发事件（例如终止治疗和使用额外药物）。在选择策略时，区分用于检测出含有相同或相似药物活性成分的治疗之间是否有差异的试验（例如，将生物类似物与参比治疗进行比较）与采用非劣效性或等效性假设来建立和量化有效性证据的试验是很重要的。为适用于监管决策，可以构建一个针对治疗效应的估计目标，优先考虑能更灵敏地检测出治疗间效应的差异。

A.4. 对试验设计和实施的影响

试验设计需要与反映试验目的的估计目标相一致。一种适用于某个估计目标的试验设计，不一定适用于其他具有潜在重要性的估计目标。治疗效应的量化依赖于估计目标，而估计目标的明确定义应当为试验设计的选择提供相关信息。这包括定义目标人群的入选和排除标准，治疗（包括方案中允许的药物和禁用药物），以及患者管理和数据收集等方面。例如，如果关注的是不管是否发生特定伴发事件的治疗效应，则该试验应收集所有受试者的变量。或者，如果用于支持监管决策而制定的估计目标不需要收集伴发事件后的变量值，则应权衡为其他估计目标收集此类数据的益处与数据收集的复杂性和潜在缺陷。

应尽可能收集所有与估计相关的数据，包括伴发事件的特征、发生情况和时间。然而数据不是总能被收集到，不能违背受试者意愿将其留在试验中，而且在某些试验中，受试者数据的缺失按照设计是不可避免的，如生存结局试验中的管理性删失。相反，如发生终止治疗、转组治疗或使用额外药物等伴发事件，尽管这些测量结果可能并不相关，并不意味着在事件之后无法测量变量。对于死亡等终末事件，变量不能在伴发事件后进行观测，但这些数据通常都不应被视为缺失。

不收集评估估计目标所需的数据，将导致后续统计推断中的缺失数据问题。统计分析的合理性可能取决于不可验证的假设和缺失数据的比例，这些可能会削弱结果的稳健性（见 A.5.）。制定一个前瞻性计划收集详细的数据缺失的原因，将有助于区分伴发事件的发生与缺失数据。这样可改进分析，并可使敏感性分析的选择更合理。例如，将“失访”记录为“因缺乏疗效而终止治疗”可能更准确。在将其定义为伴发事件的情况下，可以选择相应策略，而不是将其作为缺失数据问题来处理。为减少缺失数据，可采取措施将受试者保留在试验中。然而，减少或避免临床实践中通常会发生的伴发事件的措施存在降低试验外部有效性的风险。例如，如果在临床实践中不会通过选择试验人群，或使用滴定方案，或使用合并用药来减轻毒性的影响，那么在试验中采取这些措施也就不合适了。

随机化和盲法仍然是对照临床试验的基石。避免偏倚的设计方法请见第 2.3.节。某些估计目标可能需要或受益于试验设计的运用，如导入期或富集设计、随机撤药设计或滴定设计。在对受试者进行随机化分组之前，利用导入期识别对药物耐受的受试者主层，这个办法是可取的。监管机构和申办方之间的交流需要考虑拟定的导入期是否适合于确定目标人群，以及后续试验设计（例如，洗脱期、随机化）的选择是否支持目标治疗效应的估计和相关推断。这些考虑可能会限制这些试验设计的使用，以及特定策略的使用。

对治疗效应的精确描述应当为样本量计算提供信息。在参照历史研究时应特别谨慎，这些历史研究可能隐含或明确地报告了基于不同估计目标的治疗效应或变异的估计值。当所有受试者均能为分析提供信息，且在目标效应量和预期方差中已考虑了相应的策略来反映伴发事件的影响时，则在计算样本量时通常不需要按预期退出试验的受试者比例额外增加样本量。

第 7.2.节说明了关于跨临床试验数据汇总的问题。强调对所关注的变量应有一致的定义，并且这可延伸用于估计目标的构建。因此，为了从多个临床试验获得综合性证据，在计划阶段应构建一个合适的估计目标，使其包含在项目里的所有试验方案中，并反映在相关试验的设计选择中。类似的考虑适用于meta分析的设计，使用已完成的试验中估计的效应量来确定非劣效界值，或者使用外部对照组来诠释单臂试验。在没有考虑和说明每个数据呈现或统计分析所对应的估计目标的情况下，不同来源数据之间的简单比较，或者来自多个试验的数据整合可能会产生误导。

总的来说，一个试验有可能将多个目的转化为多个估计目标，每个估计目标都与统计检验和估计相关联。此时，应当考虑其中的多重性问题。

A.5. 对试验分析的影响

A.5.1. 主要估计

对于治疗组相较于对照组的治疗效应，估计目标通过比较具有相似受试者的试验组和对照组的结果来进行估计。对于给定的估计目标，应采用与其相一致的分析方法（或估计方法），使所得估计值可以支持对结果的可靠解读。该分析方法还应能计算置信区间并进行统计学显著性的检验。可解释的估计值是否存在，一个重要的考虑因素是分析中需要作出的假设的程度。关键假设应与估计目标及其主要估计方法和敏感性估计方法一起明确说明。假设应是合理的，要避免不恰当的假设。对于潜在的偏离假设的情形，结果的稳健性应通过与估计目标相对应的敏感性分析进行评估（见 A.5.2.）。如果一个估计依赖于多种假设或强假设，则需要更广泛的敏感性分析。如果偏离假设所造成的影响不能通过敏感性分析进行全面研究，那么这种特定的估计目标和分析方法的组合可能无法用于决策。

所有的分析方法都依赖于假设，而且即使对应于相同的估计目标，不同的分析方法也可能依赖于不同的假设。尽管如此，对于使用每种不同策略的估计目标，相对应的所有方法中都存在一些固有的假设。例如，预测在假想情形下本可观察到的结局的方法，或者在主层策略下确定合适目标人群的方法。下面列举了一些关于针对不同伴发事件采用不同策略的例子。在申办方和监管机构就估计目标、主要分析和敏感性分析达成一致之前，这里所强调的问题将是双方讨论的关键。

与针对某一伴发事件的疗法策略相对应的分析，根据试验的设计和实施，可能需要或强或弱的假设。如果在相应的伴发事件（如治疗终止）之后仍对大多数受试者进行随访，那么缺失数据问题可能相对较小。相反，如果伴发事件后停止观察（对于该策略显然是不提倡的），那么假设终止治疗受试者的（未观察到的）结局与继续治疗的受试者的（观察到的）结局相似通常是不合理的。此时在处理缺失数据时需要对其他方法进行论证，并进行敏感性分析。

与假想策略相对应的分析方法所涉及的结局与实际观测的结局不同；例如，尽管实际上给予了补救药物，但需要估计未给予补救药物时的结局。在给予补救药物之前的观测值与不需要补救药物的受试者的观测值可能提供有效信息，但需要更强的假设。

复合变量策略可以通过将伴发事件的发生视为结局的组成部分来避免对伴发事件后的数据做出统计学假设。这种情况下，潜在的担忧往往不在于估计中相应的假设，而在于对治疗效应估计结果的解释。为使估计目标得以合理解释，如果将伴发事件的发生认定为失败，并给出评分，这个评分应该能够有效地反映出患者获益的缺乏程度（例如，对死亡与因不良事件导致处理终止的反映也许不同）。

如果伴发事件发生之前的结局已被收集，那么依据在治策略所构建的估计目标就可以被估计。同样的，此处的关键假设将影响到结果的解释。以终止治疗为例，治疗过程中结局可能会改善，但同时治疗也可能因为引发、延迟、终止治疗等原因，使得治疗期缩短或延长。此类影响应在解释和评估临床获益时予以考虑。

与主层策略相对应的分析通常需要较强假设。例如，一些主层方法是基于受试者的基线特征而推断出来的，但这种推断的正确性可能难以评估。而且，这种困难不能通过简化的方法来避免。例如，假设伴发事件与处理无关，并简单地比较试验组和对照组中未出现伴发事件的受试者，这是非常难以论证的。

即使以恰当的方式定义了解决伴发事件的估计目标，并努力收集估计所需的数据（见 A.4.），一些数据仍然可能缺失。例如，生存结局试验中的管理性删失。未能收集到相关数据不应与选择不收集或选择收集但不使用（因伴发事件变得无关的数据）相混淆。例如，对于基于疗法策略的估计目标，在终止试验药物后的数据仍应被收集，如果未收集，则视为数据缺失；然而，对另一种策略而言，相同的数据点可能不相关，因此，对于相应的估计目标，此类未收集数据则不会被视为缺失。如果数据收集不完整，则有必要在统计分析中对缺失数据的处理做出一些假设。缺失数据的处理应基于临床上的合理假设，并在可能的情况下以估计目标描述中采用的策略为指导。采取的方法可能基于个体受试者和与其相似受试者所观测到的协变量和基线后数据。识别相似受试者的标准可能包括是否发生伴发事件。例如，对于终止治疗但未收集更多数据的受试者，可使用终止治疗但继续收集数据的其他受试者的数据来建模。

A.5.2. 敏感性分析

A.5.2.1. 敏感性分析的作用

基于特定估计目标的统计推断，应该对数据的局限以及主估计方法统计模型中假设的偏离具有稳健性。这种稳健性应通过敏感性分析来评价。对于所有用于监管决策和说明书制定的估计目标的主估计方法，都应有相应的敏感性分析计划。此问题需要在申办方和监管机构之间讨论并达成一致。

支持主估计方法的统计假设应明确记录。对于同一估计目标，应该预先规定一项或多项分析来评估这些假设，目的是验证根据主估计方法得出的估计值是否对假设偏离具有稳健性。其衡量标准可以是对假设不同程度的偏离是否会改变结果的统计学或临床意义（如临界点分析）。

敏感性分析旨在探索偏离假设时分析结果的稳健性，与此不同的，为了更全面地研究和理解试验数据而进行的其他分析可称为“补充分析”（见词汇表，A.5.3.）。如果申办方和监管机构就所关注的主要估计目标达成一致，并预先明确规定了主估计方法，且敏感性分析也验证了估计值的结果解释是可靠的，则补充分析在结果评估中通常不被优先考量。

A.5.2.2. 敏感性分析的选择

当计划和实施敏感性分析时，同时改变主要分析的多个方面可能难以确定由哪些假设导致了目前所观测到的潜在差异。因此，通常采用结构化的方法，指定不同分析背后的假设的变化，而不是简单地基于一组不同的假设比较不同分析的结果。应根据具体情况考虑是否需要同时改变多个假设的分析。在评估不同分析的解释和相关性时，区分可验证的和不可验证的假设可能是有帮助的。

在本文所设立框架中，进一步明确了对缺失数据进行敏感性分析的必要性和重要性。缺失数据应依据特定估计目标进行定义和考虑（见 A.4.）。对应于特定估计目标的缺失数据，以及与特定估计目标不直接相关的数据，两者之间存在区别，由此在分析中产生了不同类别的假设，需要通过敏感性分析来检查。

A.5.3. 补充分析

试验结果的解释应侧重于对应每个估计目标的主估计方法，并通过敏感性分析验证相应估计值的稳健性。除了主要分析和敏感性分析之外，还可以对估计目标进行补充分析，以提供对治疗效应更全面的了解。补充分析在解释试验结果方面的作用通常较小。每项试验均需考虑补充分析的必要性和作用。

第 5.2.3.节指出，同时基于全分析集（FAS）和符合方案集（PPS）的分析计划通常是适当的，从而它们之间的差异会成为讨论和结果解读的关键。如果基于FAS分析和PPS分析的结果一致，则可增强试验结果的可信度。第5.2.2.节还指出，基于PPS的结果可能会产生严重偏倚。就本增补中提出的框架而言，可能无法构建与PPS分析相对应的估计目标。如上所述，PPS分析不能实现在任何主层（例如，在能够耐受并继续接受试验药物的受试者）中估计效应的目的，因为PPS所比较的受试者在不同治疗组之间可能不具有可比性。

即使没有发生伴发事件，方案违背和偏离（例如，在时间窗外进行访视）也可能会使受试者从PPS中被排除。同样，受试者可能发生伴发事件（例如死亡）但却没有偏离方案。尽管违背和偏离方案与伴发事件之间存在差异，在估计目标的描述中仍应考虑可能影响观测结果解释或存在的事件。通过构建估计目标和相应的分析方法，可能更好地反映与PPS分析相关的目标。此时，PPS分析也许不能提供额外的信息。

A.6. 估计目标和敏感性分析的记录

试验方案应当定义并明确说明与主要试验目的相对应的主要估计目标。方案和分析计划中应预先规定与主要估计目标一致并对应主要分析的主估计方法，以及当假设偏离时用来探索结果稳健性的合适的敏感性分析。对于可能支持监管决策的次要试验目的（例如，与次要变量相关），相应的估计目标也应当明确定义和说明，并且每个估计目标都有相应的主估计方法和合适的敏感性分析。还可以考虑属于探索性质的额外的探索性试验目的，此时也会产生额外的估计目标。

主要估计目标的选择通常是试验设计、实施和分析的主要决定因素。按照常规，这些信息应在试验方案中详细记录。如果次要估计目标同样需要重点关注，这些考虑可扩展到相应的估计目标，并同样在方案中记录。除此之外，对于试验设计、实施和分析的常规考虑仍然保持不变。

尽管明确阐明估计的内容对申办方有益，但监管部门并不要求对每一个探索性目的的估计目标都进行记录。

在临床试验报告中应系统报告主要分析、敏感性分析和补充分析的结果，同时详细说明每项分析是否为预先规定的、在试验仍处于盲态时引入进行的，还是事后进行的。应汇总报告各处理组中各类伴发事件的数量和出现时间。

试验期间改变估计目标可能是有问题的，这样做会降低试验的可信度。对于在设计阶段未预见但在试验实施过程中发现的伴发事件，不仅要讨论分析方法的选择，还要讨论它们对估计目标的影响，即对所估计的治疗效应描述的影响，和对试验结果解释的影响。估计目标的改变通常应通过修订试验方案来体现。

词汇表

术语	内容
估计目标：	对治疗效应的精确描述，反映了针对临床试验目的提出的临床问题。它在群体水平上汇总比较相同患者在不同治疗条件下的结局。
估计值：	由估计方法计算得出的数值。
估计方法：	采用临床试验数据计算估计目标的估计值的分析方法。
伴发事件：	治疗开始后发生的事件，可影响与临床问题相关的观测结果的解释或存在。在描述相关临床问题时，需解决伴发事件，以便准确定义需要估计的治疗效应。
缺失数据：	对于既定估计目标的分析有意义、但未收集到的数据。它应该与不存在的数据，或由于伴发事件而被认为没有意义的数据区分开来。
主分层：	根据所有治疗中伴发事件的潜在发生情况，对受试者进行的分类。以两种治疗为例，针对特定的伴发事件，有四个主层：任一治疗期间均不会发生事件的受试者，在A治疗期间会发生事件但在B治疗期间不会发生事件的受试者，在B治疗期间会发生事件但在A治疗期间不会发生事件的受试者，以及在两种治疗期间均会发生事件的受试者。在本文件中，主层是指主分层定义的任何分层（或分层组合）。
敏感性分析：/td>	针对模型假设的偏离和数据局限，探索主估计方法统计推断的稳健性的一系列分析。
补充分析：	对于主要分析和敏感性分析之外的分析的一般描述，目的是更多地了解治疗效应。

ICH E10

ICH E11

ICH E12

ICH E14

ICH E15

ICH E16

ICH E17

ICH E18

ICH E19

ICH E20

ICH E20 Adatpative Designs for Clinical Trials

1. INTRODUCTION AND SCOPE

This document provides guidance on confirmatory clinical trials with an adaptive design intended to evaluate a treatment for a given medical condition within the context of its overall development program. For the purpose of this guideline, an adaptive design is defined as a clinical trial design that allows for prospectively planned modifications to one or more aspects of the trial based on interim analysis of accumulating data from participants in the trial. The term prospectively planned means that the potential trial adaptations are pre-specified in the clinical trial protocol prior to initiation of the trial. The scope of this guideline does not include trials with unplanned modifications to the design, such as a protocol amendment proposed by an independent data monitoring committee (IDMC) based on unexpected interim results. It also does not include design changes based entirely on emerging information from a source external to the trial. Routine monitoring of operational aspects such as the enrollment rate, data quality, or extent of participant withdrawal is also out of scope.

The focus of this guideline is on principles for the planning, conduct, analysis, and interpretation of trials with an adaptive design intended to confirm the efficacy and support the benefit-risk assessment of a treatment. The emphasis is on principles that are critical to ensuring the trials produce reliable and interpretable information and that require specific considerations with use of an adaptive design. This guideline does not discuss the use of specific statistical methods. Although the guideline primarily focuses on confirmatory clinical trials, the principles outlined are relevant to all phases of clinical development.

2. ADVANTAGES AND CHALLENGES OF ADAPTIVE DESIGNS

At the planning stage of confirmatory trials, uncertainty may remain regarding design aspects such as the appropriate sample size, even after careful planning and conduct of earlier phases of drug development. Yet, with a non-adaptive design, these aspects have to be determined before the trial starts and cannot be changed during trial execution. Adaptive designs provide flexibility and the ability to safeguard against inaccurate assumptions by taking advantage of the accumulating information from trial participants and allowing pre-specified modifications to design aspects during the trial.

This added flexibility can lead to a variety of advantages. First, adaptive designs can provide ethical advantages. For example, a group sequential design with the potential for early trial stopping if there is convincing evidence the treatment is efficacious and has a positive benefit-risk profile can reduce the number of participants exposed to an inferior control. Second, adaptive designs can improve the efficiency of a trial, for example, by increasing its power for a given expected sample size. Third, adaptive designs can help improve understanding of treatment effects and decision-making. For example, a confirmatory two-stage adaptive design with selection between two doses at an interim analysis may reduce uncertainty about the dose with the better benefit-risk profile while also allowing for confirmation of the efficacy of the selected dose.

However, adaptive designs also present challenges, as they may add complexities and uncertainty related to the key principles discussed in Section 3. For example, use of an adaptive design may add logistical difficulties in maintaining confidentiality of interim results and introduce risks to trial integrity which, if not properly addressed, may lead to unreliable results and complications with their interpretation at trial end. In addition, appropriate planning for and assessment of a trial with an adaptive design can be more complex and may require more time than for a trial without an adaptive design. In particular, use of conventional analysis methods that would apply in non-adaptive designs usually lead to an increased Type I error probability and biased treatment effect estimate. For example, in a design with an interim analysis to modify the target sample size based on the estimated treatment effect, the Type I error probability can be more than doubled when using analysis methods that do not account for the adaptation. As another example, the potential for early stopping for efficacy may lead to biased treatment effect estimates because the trial will be stopped preferentially when extreme data have been observed. Therefore, special analysis methods for hypothesis testing and estimation that account for the adaptive design usually need to be used. In addition, some trials with adaptive designs may provide less information about safety, potentially leading to more uncertainty during benefit-risk assessments. Also, adaptive designs may not be beneficial in all clinical trial settings. For example, adaptive designs may not be favored if there is fast enrollment of participants relative to the assessment time of the endpoint on which the adaptation is based, or if data cannot be made available quickly enough to facilitate reliable adaptation decisions at an interim analysis.

The decision to use or not use a specific adaptive design in a clinical trial will depend on many factors, including the ones described above. There can be a tension between the confirmatory nature of a late-stage clinical trial and the proposal to adapt aspects of the trial while it is ongoing. In planning an adaptive design, it is therefore essential to carefully justify the need to adapt the trial and assess potential implications of the type, number, and complexity of the adaptations involved. The justification should include both clinical and statistical considerations. It should weigh the advantages of the design against the extent to which the adaptations being considered add uncertainty about the trial’s ability to produce reliable and interpretable results. For example, the addition of a carefully planned interim analysis to potentially stop a trial early for efficacy or futility using appropriate pre-specified stopping rules and ensuring sufficient information for safety and benefit-risk assessment, along with use of an IDMC to maintain trial integrity, may add minimal uncertainty. On the other hand, a complex design involving adaptations to multiple trial features may add considerable uncertainty related to maintaining trial integrity. This could include uncertainty about the adequacy of information flow and data access specifications, or the potential impact of the adaptation itself on trial conduct and the trial’s ability to provide interpretable treatment effects. This can lead to challenges in assessing results and in regulatory decision-making about the efficacy and benefit-risk profile of a proposed dose of a treatment for a specific patient population. A proposed adaptive design requires a clear and compelling justification. This justification should discuss how the proposed design addresses inherent needs of the clinical setting and should provide an evaluation of advantages and limitations as compared to alternative designs (including non-adaptive designs), including a comparison of important trial operating characteristics (e.g., power, expected sample size, reliability of adaptation decisions) between candidate designs.

3. KEY PRINCIPLES

For the purpose of this guideline, a principle refers to a characteristic of a trial design that is critical to ensure the reliability and interpretability of the results. This section describes principles that require specific considerations with an adaptive design. The focus is on proposals for confirmatory trials with an adaptive design. All of these principles should be followed regardless of the type of adaptation and statistical approach (e.g., frequentist or Bayesian methods).

3.1 Adequacy Within the Development Program

It is important that clinical trials are properly designed, conducted, and analyzed to address the clinical research question(s) of interest within the context of an overall development program. A stepwise program with careful analysis and evaluation of completed exploratory trials helps inform the goals and design choices for subsequent confirmatory trials and ultimately generate data necessary for regulatory decision-making. A complete development program should seek to, among other aspects: characterize the dose-response relationship with respect to favorable and unfavorable effects; identify an appropriate patient population for treatment; select clinically meaningful and sensitive endpoints; and reliably confirm efficacy and support the assessment of safety and benefit-risk in the intended patient population.

The number and complexity of adaptations at the confirmatory stage should generally be limited. Increasing either of them, as a replacement for a sequence of multiple trials, can impair the ability to answer important clinical questions and limit the opportunity to carefully reflect on prior results to design a development program most effectively. Before planning a confirmatory trial with multiple adaptations, sponsors should discuss whether additional exploratory trials are necessary to investigate the question(s) addressed by the proposed adaptation(s).

For example, consider a confirmatory two-stage adaptive clinical trial design with selection between two doses at an interim analysis, and confirmation of efficacy of the selected dose. In a setting where a dose-ranging trial has been conducted with remaining uncertainty about the most appropriate of two candidate doses, such a design may help ensure identification of the dose with the better benefit-risk profile in the intended patient population. However, if a proper dose-ranging trial was not conducted in earlier stages of the development program, the selection of two doses for the confirmatory trial(s) may not be well supported, adding risk that the program may fail to identify an appropriate dose. An adaptive design should generally not serve as a replacement for a proper dose-ranging trial. It is generally expected that the sponsor has completed the necessary trials to evaluate a wider range and number of doses before proceeding to the confirmatory trial(s) intended to confirm efficacy and assess benefit-risk.

3.2 Adequacy of Trial Planning

Adequate planning is important for all clinical trials to ensure the design is pre-specified, conduct and analysis are appropriate, and results are reliable and interpretable. If a confirmatory clinical trial is planned with an adaptive design, the number and complexity of adaptations should generally be limited and there should be a justification for adapting aspects of the trial at this stage of drug development. Prior to initiation of a trial with an adaptive design, further aspects should be specified and justified in addition to the typical components of trial planning. These include the number and timing of interim analyses, type of adaptation, statistical methods for producing interim results, anticipated rule governing the adaptation decision, statistical methods for the primary analysis aligned to each targeted estimand, and approaches to maintain trial integrity. For adaptive designs with a planned selection of an estimand at an interim analysis, such as treatment- or population-selection designs (Sections 4.3 and 4.4), all candidate estimands should be fully pre-specified and clinically relevant.

Some types of adaptive designs may require more planning than others. For example, a design with unblinded sample size adaptation warrants additional approaches to maintain trial integrity than one with blinded sample size adaptation. If simulations are critical to understand operating characteristics of an adaptive design, the simulation study should be carefully planned, conducted, and reported (Section 5.2). All relevant details pertinent to the planning of an adaptive trial design should be appropriately documented (Section 6).

Adequate planning facilitates the evaluation of the appropriateness of the statistical approach for many types of adaptations. For example, Type I error probability control requires the pre-specification of criteria for early efficacy stopping or rules for combining evidence across stages. As another example, specifying a blinded sample size adaptation in the protocol, together with the adaptation rule, increases confidence that an adaptively selected sample size was not influenced by unblinded data. Adequate planning also facilitates the evaluation of trial operating characteristics and enables informed discussions with the IDMC (if involved in the adaptations). Sponsors should discuss the type of adaptations and anticipated adaptation rules in detail with the IDMC to confirm its understanding and support. This ensures the IDMC is prepared to review interim results and make adaptation recommendations during the trial while also protecting individual trial participants’ safety.

There should always be a clear description of the anticipated rule on which the adaptation will be based. The extent to which the anticipated rule governing the adaptation decision needs to be adhered to at an interim analysis, however, can vary depending on the type of adaptation and the statistical inferential methods being used. It is generally recommended to use analysis methods that provide valid inference while allowing flexibility to deviate from the anticipated adaptation rule based on the overall benefit-risk assessment at an interim analysis. For example, consider a confirmatory two-stage adaptive clinical trial with selection between two doses at an interim analysis, with the objective to confirm the efficacy and support the benefit-risk assessment of the selected dose. At the trial planning stage, an efficacy-based rule for the interim dose selection may be planned given that no meaningful safety issues are expected. There is a chance, however, that interim data will suggest similar efficacy between the two doses, with an unexpected safety concern for the higher dose. When using statistical methods that allow for the flexibility to incorporate such benefit-risk considerations at the interim analysis, the pre-specified plan should acknowledge the possibility of deviations from the rule and outline factors that may lead to such deviations. If the planned statistical methods instead require strict adherence to the rule governing the interim decision to ensure valid inference (e.g., Type I error probability control), the importance of adhering to the rule should be documented in the trial protocol.

3.3 Limiting the Chances of Erroneous Conclusions

It is important to limit the chances of erroneous conclusions about the efficacy, safety, and benefit-risk profile of a proposed treatment. An essential element of regulatory decision-making is controlling the chances of false positive efficacy conclusions (i.e., conclusions that truly inefficacious treatments are efficacious). The common approach is to limit the probability of false positive efficacy conclusions within a trial by using frequentist methods that control the Type I error probability for a hypothesis test of the primary estimand at a pre-specified threshold (ICH E9).

For most adaptive designs, it is necessary to use specific methods to control the Type I error probability. For example, if a design includes an interim analysis with the potential for early stopping for efficacy, appropriate pre-specified stopping rules are needed. When an adaptive trial design includes multiple testing approaches to control the Type I error probability across multiple primary and/or secondary endpoints, those approaches should additionally address the potential for an increased Type I error probability due to the proposed adaptation.

Although the predominant approaches to the design and analysis of clinical trials have been based on frequentist statistical methods, other approaches may be appropriate when the reasons for their use are clear and when the resulting conclusions are sufficiently robust (ICH E9). Section 5.3 describes important considerations for limiting the chances of false positive efficacy conclusions in adaptive designs using Bayesian methods.

It is also important to understand how a proposed adaptive design may impact the potential for other types of erroneous conclusions. This includes the need for the trial to provide sufficient information on safety, important secondary efficacy endpoints, and relevant patient subgroups to inform a reliable benefit-risk assessment. For example, when planning a trial with the potential to stop early for an efficacy conclusion, it is important to justify that the sample size and duration of follow-up at an interim analysis can adequately support a reliable benefit-risk assessment. This also includes evaluation of the impact of adaptive designs on conclusions made at interim analyses, and the risk that the adaptive design may be inadequate to fulfill the trial objectives. For example, sponsors should evaluate the ability of an adaptive dose-selection design to select the better out of two doses at an interim analysis based on efficacy and benefit-risk considerations. Finally, adaptations can impact the chance of a false negative efficacy conclusion (i.e., lack of evidence of an effect for a truly efficacious treatment) such that it is important to evaluate whether the trial achieves adequate power.

3.4 Reliability of Estimation

Controlling the chances of false positive efficacy conclusions is expected in a confirmatory clinical trial (Section 3.3). In addition, reliable estimation of treatment effects for the primary efficacy endpoint and other key efficacy and safety outcomes is important to facilitate the benefit-risk assessment and inform regulatory decision-making. The primary analysis of a trial with an adaptive design should therefore provide an estimate of the treatment effect that is reliable and aligned with the estimand of interest. Sponsors should evaluate bias and variability of treatment effect estimates, including measures such as the mean squared error. In the trade-off between bias and variance, the expectation is generally for limited to no bias in the primary estimate of the treatment effect. The primary analysis should also support calculation of accurate measures of uncertainty such as confidence intervals with targeted coverage probabilities.

If a trial with an adaptive design uses approaches for estimation in the primary analysis that do not account for the adaptive nature of the design, unreliable treatment effect estimates and incorrect estimates of uncertainty (e.g., incorrect confidence interval coverage) may arise. For example, selecting the treatment with the largest estimated effect from among several treatments at an interim analysis will, on average, lead to an overestimation of that treatment’s effect. This holds true even if selection is based on an endpoint expected to be predictive of efficacy rather than the primary endpoint itself. Similarly, treatment effect estimates for secondary endpoints may be biased in the presence of adaptations. Adaptive design proposals should therefore evaluate bias and variability of treatment effect estimates and provide support of their reliability. In some cases, bias and variability can be calculated analytically. In other cases, the evaluation has to rely on simulations. For some designs, specific estimation methods have been derived with improved reliability, and these should be used. As one example, methods are available in group sequential designs for adjusting estimates to reduce or remove bias associated with the potential for early stopping and to improve performance on measures such as the mean squared error.

In addition to ensuring reliable estimation of the treatment effect in the primary analysis, it is also important to support that estimates at interim analyses can facilitate reliable adaptation decisions. For example, conducting an interim analysis in an adaptive dose-selection design at an early time point may result in highly variable estimates and the selection of an inferior dose. Sponsors should therefore evaluate the overall operating characteristics of the design (e.g., probability of selecting the better dose) to inform careful selection of the timing of an interim analysis and the adaptation rules.

3.5 Maintenance of Trial Integrity

It is important that the integrity of a trial is maintained such that it achieves its objectives in a reliable, ethical, and timely manner. The impact of trial adaptations on the statistical validity of trial results is discussed in Sections 3.3 and 3.4. Maintenance of trial integrity also relies on appropriate execution of the trial and careful assessment of the potential impact of envisaged adaptations on trial conduct, which is the focus of this section.

Knowledge by the sponsor, investigators, or trial participants about individual treatment assignments, accumulating data, or certain trial changes can impact trial integrity by affecting expectations and behaviors in ways that are difficult to predict and impossible to adjust for. Such knowledge can introduce subtle changes in trial conduct, such as changes in the pace and characteristics of participants enrolled, specific details of the administration of the study treatment or other medications, or endpoint assessments, that may impact the interpretation of trial results. For example, knowledge by investigators and trial participants of a small or unfavorable estimated treatment effect based on accumulating data during an ongoing trial could be misinterpreted as reliable evidence of no effect, causing decreased enrollment, adherence, and retention of trial participants, ultimately leading to unreliable results and difficulties with their interpretation at trial end. The recommended approach is to blind participants, investigators, and the sponsor to individual treatment assignments and to accumulating summary-level data in which treatment groups are identified (either with the actual treatments or with labels such as A and B), therefore limiting the risk for occurrence of conscious and unconscious changes in trial conduct arising from such knowledge.

A fundamental aspect of many types of adaptive designs is the need for some level of access to unblinded interim results. Personnel having access to accumulating unblinded data should generally be independent in the sense that they do not have conflicts of interest or any role in trial activities and are external to the sponsor. To achieve this, an IDMC should be in place to review unblinded interim data when such access is needed as part of the adaptive design. In confirmatory trials, an IDMC will often already be planned to assure the safety of trial participants and to protect the scientific integrity of the trial. In this case, the IDMC can have an additional role of reviewing interim data for the purpose of implementing the planned adaptations. If an IDMC is not already planned, one can be set up with objectives and member expertise targeted toward implementing the adaptive design. Standard operating procedures and confidentiality agreements should be put in place to limit access to unblinded interim results beyond the IDMC. Additional discussion about the IDMC and other data monitoring considerations is available in Section 5.1.

Even the knowledge of an adaptation itself can lead to unwanted changes in behavior on the part of investigators or trial participants or can potentially reveal information about unblinded interim results. For example, if an unblinded sample size adaptation is implemented, where the revised sample size is a function of an interim treatment effect estimate, someone who understands the adaptation rule and knows the revised sample size can infer the interim effect estimate. Therefore, measures should be implemented to minimize the information that can be inferred, while maintaining ethical standards (e.g., adequate informed consent forms) and ensuring operational feasibility (e.g., adequate drug supply); see further discussion of operational considerations in Section 5.6. One particular approach to limit the knowledge that can be inferred during the trial is to use adaptation rules where a sufficiently large range of interim estimates leads to the same change (e.g., with a sample size adaptation rule that includes only a small number of potential adaptively selected sample sizes). Details of the adaptation rule could be reserved for a specific document rather than the protocol, such as a confidential appendix to the IDMC charter, that is only accessible to designated sponsor personnel separated from the team managing and conducting any aspects of a clinical trial. Additionally, sponsor personnel, investigators, and trial participants could be shielded from knowledge of specific adaptive changes. For example, trial sites could be informed after a sample size adaptation that the targeted enrollment has not been reached, or notified of site- or region-specific targets, rather than notified of the overall sample size target.

Sponsors should discuss with regulators at the planning stage the potential implications of the adaptations on trial conduct, including the type of participants enrolled, and on the interpretation of the results at trial end. This should include a discussion of the sufficiency of the size of the trial stages for assessing the impact of adaptations. Sponsors should implement approaches for maintaining trial integrity. Processes should be documented to increase adherence to these approaches and to provide transparency to relevant stakeholders (e.g., regulatory authorities and participating investigators). Appropriate training and careful planning are needed to prevent compromises to the extent possible. Because even the most rigorous processes may not fully guarantee trial integrity, the interpretation of results at trial end should involve consideration of any heterogeneity between results from different stages of the trial, the nature of the adaptive design (e.g., the number and type of adaptations and the size of the stages of the trial), the processes in place and who had access to different kinds of data and information during the trial, and any notable changes in trial conduct before and after an interim analysis (e.g., changes in the types of participants enrolled). Unexpected heterogeneity findings should be discussed by the sponsor and may impact the interpretation of the trial results.

The principles for maintaining trial integrity discussed above are particularly critical in open-label trials in which each participant’s individual treatment assignment is known to the participant and/or investigator. Notably, even though individual participant assignments are known in such trials, it is feasible and strongly recommended to ensure that participants, investigators, and the sponsor do not have access to accumulating summary-level data by treatment group.

4. TYPES OF ADAPTATIONS

This section discusses common types of adaptations, with a focus on specific considerations relevant to the principles in Section 3. This section also illustrates some of the advantages and challenges of adaptive designs outlined in Section 2. The discussion focuses on designs using frequentist approaches for statistical analysis. For special considerations related to adaptive designs using Bayesian methods, see Section 5.3.

4.1 Early Trial Stopping

During the conduct of a clinical trial, accruing data can provide information that makes it no longer appropriate to continue the trial. To address this, sponsors can consider a trial design that includes prospectively planned sequential analyses of accumulating unblinded data with anticipated rules for stopping when there is compelling evidence of efficacy (stopping for efficacy) or when the trial is unlikely to demonstrate efficacy (stopping for futility). A clinical trial design that allows such sequential analyses for early efficacy stopping based on accumulating observations of groups of participants at pre-specified points throughout the trial is called a group sequential design.

When planning a trial design that allows for early efficacy stopping, appropriate stopping boundaries should be planned for the sequential analyses such that the Type I error probability is controlled. The timing of interim analyses and specific stopping rules should be justified based on factors such as the required persuasiveness of early results to stop the trial, the probability of early stopping, and the expected and maximum sample sizes or numbers of events that may be accrued. Approaches may be considered that allow deviation from the anticipated timing of interim analyses. For example, this could help accommodate the scheduling of IDMC meetings at specific calendar times, such that the actual sample size at an interim analysis may differ slightly from the pre-specified target. In addition, methods for calculating the primary treatment effect estimate and associated confidence interval that adjust for the interim analyses should be planned to limit bias and improve performance on measures such as the mean squared error (Section 3.4).

A trial that is stopped early for efficacy will provide less information (e.g., because of a smaller sample size and/or shorter duration of follow-up) for the evaluation of safety, important secondary efficacy endpoints, and relevant patient subgroups, which are important for the overall benefit-risk assessment. Therefore, the timing of interim analyses should be selected such that the sample size is large enough and the duration of follow-up is long enough to ensure sufficient information is available for decision-making. There usually is a limit on how early interim analyses should occur or whether they should occur at all because a minimum sample size and/or duration of follow-up is expected for a sufficient evaluation of safety. This is often a relevant criterion, for example, in preventive vaccine trials and to meet regulatory standards for the extent of population exposure for treatments intended for long-term treatment of non-life-threatening conditions (ICH E1). Furthermore, interim analyses with the potential for early stopping are more often considered in circumstances where there are compelling ethical reasons (e.g., the primary endpoint is survival), and efficacy stopping rules typically require highly persuasive results in terms of both the magnitude of the estimated treatment effect and the strength of evidence of an effect.

In the case that a stopping rule at an interim analysis is met and a decision is made to stop the trial for efficacy, additional data beyond those included in the interim analysis may continue to accumulate on participants in the trial prior to the final database lock. This can occur as a result of a time lag between data collection and interim analysis during which data adjudication and cleaning are carried out. Sponsors should ensure this additional information is appropriately documented and should report results from the interim analysis and from the analysis based on all available data, which are both important for regulatory decision-making. For example, a change in the estimated treatment effect between these two analyses that may affect the benefit-risk assessment would warrant investigation of potential explanations and may make interpretation of the trial results challenging.

When a trial design incorporates the potential for futility stopping, while anticipated futility rules should be pre-specified and justified, it is generally recommended to use nonbinding futility rules. This means that the futility stopping criteria serve as guidelines that can be deviated from based on the interim results without increasing the Type I error probability. This flexibility is important because decision-making about whether to stop for futility or continue is usually not an algorithmic process and may need to incorporate additional information beyond the primary efficacy endpoint, such as safety or other efficacy data. In contrast, there have been proposals to use binding futility rules and adjust the efficacy decision criteria for the planned futility criteria. These approaches have the disadvantage of requiring that sponsors adhere to the pre-specified futility stopping criteria, as otherwise the Type I error probability is not controlled and the interpretation of trial results can be compromised.

4.2 Sample Size Adaptation

Even after a carefully planned and conducted early-phase development program, a considerable degree of uncertainty might exist in the parameter assumptions that affect the sample size calculations for a clinical trial. One source of uncertainty are assumptions about the nuisance parameters that are not of primary interest but may affect the sample size of a trial. Examples of nuisance parameters include the standard deviation of a continuous outcome and the probability of response of the control arm for a binary outcome, which can be highly variable across trials in certain disease settings. In such cases where a sound rationale exists, sponsors may consider incorporating the potential for modification of the initial sample size based on interim estimates of nuisance parameter values to ensure the trial is adequately powered. Another source of uncertainty at the planning stage are assumptions about the anticipated treatment effect size. In cases where there is justification based on residual uncertainty (e.g., after appropriate exploratory trials; see Section 3.1), sponsors may consider a sample size adaptation based on an interim treatment effect estimate. The goal would be to ensure sufficient power under a range of plausible and clinically meaningful treatment effect sizes.

Appropriate planning of any design incorporating sample size adaptation should include pre-specification and justification of the minimum and maximum potential sample sizes, the anticipated sample size adaptation rule, and the statistical analysis method. It is important that the minimum sample size still provides sufficient information for benefit-risk assessments (e.g., for evaluating safety, secondary endpoints, and subgroup analyses), similar to considerations for early stopping (Section 4.1).

Adaptations to the sample size based on nuisance parameter estimates should be carried out using blinded data as this approach does not incorporate information about treatment assignment, thus minimizing risks for trial integrity. The anticipated sample size adaptation rule should be pre-specified to increase confidence that an adaptively selected sample size was not influenced by unblinded data. Such pre-specification also facilitates evaluation of trial operating characteristics (e.g., power and expected sample size). Sponsors should propose and justify a testing approach that controls the Type I error probability. In some cases, conventional analysis methods that would apply in non-adaptive designs can be used for the primary analysis if there is justification (e.g., in a reasonably sized two-arm superiority trial with a continuous endpoint). In other cases (e.g., a two-arm non-inferiority trial with a continuous endpoint), the use of these conventional methods may lead to an increase in the Type I error probability and different approaches are needed.

Trials with sample size adaptations based on interim effect estimates should use an IDMC and adequate processes to maintain trial integrity, given that the adaptations are based on unblinded data. This should include steps to minimize the information that can be inferred from the interim sample size selection (Section 3.5). Given that such designs typically allow for an increase in sample size compared to the initially planned sample size, statistical significance can be achieved with weaker observed effects than initially planned. When planning such a design, it is therefore important to judge the magnitudes of effects that would be clinically meaningful, justify the added participant exposure, and ensure that the potential sample sizes under the adaptive design are sensible from a clinical perspective.

It is generally recommended to use sample size adaptation methods that do not require adherence to the anticipated adaptation rule, such as hypothesis testing based on pre-specified weights for combining the information across trial stages. Still, the anticipated adaptation rule should be pre-specified to facilitate the evaluation of trial operating characteristics (e.g., expected sample size and power) and ensure that the IDMC understands and is in agreement with the anticipated adaptation rule.

For most designs involving adaptations to the sample size based on interim treatment effect estimates, conventional testing methods for non-adaptive designs are not appropriate and specific statistical methodology needs to be used to ensure Type I error probability control. In addition, conventional point estimates of the effect size may be biased, and conventional confidence intervals may have incorrect coverage probabilities. Therefore, it is recommended to evaluate the reliability of these estimates at the trial planning stage. This evaluation may inform the acceptability of the proposed adaptive design or the interpretation of trial results. In some cases, methods are available that adjust estimates to reduce or remove bias associated with the adaptation and these are preferred.

4.3 Population Selection

In certain settings, there may be remaining uncertainty about the patient population who should be treated with a new treatment. For example, a treatment may be expected to benefit a certain targeted subset of the overall population, while the benefit in the non-targeted (complementary) subset may be unclear. This targeted subpopulation could be defined, for example, by demographic characteristics or by a genetic or pathophysiologic marker that is assumed to be related to the treatment’s mechanism of action. If the treatment were truly efficacious in the targeted subpopulation but not efficacious or minimally efficacious in the complementary subpopulation, conducting a trial in the overall population might have insufficient power to establish a treatment effect and might unnecessarily expose participants to a treatment from which they will not receive benefit. On the other hand, if the treatment were truly efficacious in the overall population, a trial in only the targeted subpopulation would not provide data on the effects of the treatment in the complementary subpopulation and would result in restricting the indication for the treatment to only a subset of the overall population that would benefit. Such uncertainty would usually be investigated in an exploratory trial. However, in some cases there also may be consideration to conducting a confirmatory trial in the overall population, with an analysis plan that includes evaluation of efficacy in a targeted subpopulation (e.g., with a multiple testing approach to control the Type I error probability across analyses in the overall population and in the subpopulation). Alternatively, it may be more efficient to consider a design for a confirmatory trial with the option for adaptations to the patient population based on unblinded interim results. A trial might enroll participants from the overall population up through an interim analysis, at which time a decision would be made whether to continue enrollment in the overall population or to restrict future enrollment to a targeted subpopulation. If enrollment continues in the overall population, a decision would then need to be made whether to evaluate in the analysis at trial end the treatment effect in only the overall population, or in both the overall population and the targeted subpopulation. If enrollment is restricted to the targeted subpopulation, the analysis at the end of the trial would focus on the treatment effect in that subpopulation. In such settings, data accumulated both before and after the interim analysis should be appropriately combined to draw inference on the treatment effect in the selected population(s).

Adequate planning of such designs should include pre-specification of the candidate population(s) that may be selected at the interim analysis to be the target of future enrollment, the decisions to be made at the interim analysis regarding the population(s) for statistical inference and how they will be analyzed at the end of the trial, and the anticipated adaptation rules. There should also be a plan for managing participants from a population for which further enrollment and evaluation is stopped based on an interim analysis. In designs that select population(s) for enrollment and analysis based on interim treatment effect estimates, specific statistical methodology is typically needed to control the Type I error probability. Methods are generally recommended that allow flexibility in deviating from the anticipated adaptation rule, as considering the totality of information available at the interim analysis helps ensure appropriate population selection. Sponsors should also ensure that interim estimates can facilitate reliable population selection, including planning the interim analysis at an appropriate time point. Furthermore, given that such a design tends to select population(s) with more favorable interim results, conventional treatment effect estimates at trial end may be biased. The reliability of the treatment effect estimates in the different populations should be evaluated, and adjusted estimates that reduce or remove bias should be considered.

It is important that a trial with population adaptation has a sound scientific rationale. For example, a trial in the overall population that includes an interim analysis to potentially focus future enrollment and analysis on a particular subpopulation should be motivated by results from previous trials and/or biologic evidence that the benefit-risk profile may be meaningfully more favorable in the targeted subpopulation. With such a trial, it is also important to ensure that the design facilitates reliable decision-making in the scenario in which enrollment in the overall population continues after the interim analysis. This includes ensuring that the trial will provide adequate information on the benefit-risk profile in the complementary subpopulation. It also includes specifying criteria, including criteria for the estimated treatment effect in the complementary subpopulation, that would justify a conclusion of benefit in the overall population. If the baseline characteristic that may be used to define subpopulations is not binary in nature, justification should be provided at the planning stage for any threshold(s) used to define the subpopulations.

4.4 Treatment Selection

Some trials are conducted with the intent to evaluate more than one treatment. The multiple treatments might be different drugs or different doses of a single drug. For example, there might be uncertainty remaining at the end of the exploratory development phase about the benefit-risk profile of two likely efficacious doses of a certain drug. A confirmatory trial might then compare these two doses against control with the objective to confirm their efficacy and to select the most appropriate dose(s) at trial end. In such a setting, it may be conceivable to design a trial with the option for dose selection based on an interim analysis of accumulating unblinded data. Participants would initially be randomized to either of the two doses or control. At the interim analysis, one or both doses would be selected for continued randomization in the second stage. The analysis at the end of the trial would then aim to confirm efficacy and assess benefit-risk of the selected dose(s) based on data across both trial stages.

Adequate planning of a trial with adaptive treatment selection should involve specification of the treatments that will be evaluated, the decisions to be made at the interim analysis, and the anticipated rules for the selection process, including any implications for the randomization scheme and overall sample size. There should also be a plan for managing participants who are receiving a treatment for which further evaluation is stopped based on an interim analysis. In a design that potentially selects one (or more) treatments based on interim effect estimates, specific statistical methodology is needed to control the Type I error probability. It is generally recommended to use methods that allow for flexibility in deviating from the anticipated adaptation rule. Such flexibility enables consideration of the full scope of information available at the interim analysis, helping to support more informed and appropriate treatment selection decisions. Sponsors should also ensure that interim estimates can facilitate reliable treatment selection, including planning the interim analysis at an appropriate time point. Given that such a design tends to select treatment(s) with more favorable interim results, conventional treatment effect estimates at trial end may be biased. The reliability of estimates should be evaluated, and adjusted estimates that reduce or remove bias should be considered.

4.5 Adaptation to Participant Allocation

In a randomized trial, participants are typically allocated to treatment arms according to fixed randomization probabilities. Alternatively, there are different approaches that can be considered to incorporate adaptations to the allocation scheme, where the assignment of participants to treatment arms depends on the data of earlier trial participants. These include covariate-adaptive approaches where assignment depends on accumulating baseline covariate data and response-adaptive approaches where assignment depends on accumulating outcome data. This section focuses on response-adaptive randomization (RAR) approaches where incoming participants are randomized to treatments according to probabilities that depend on previous unblinded outcome data. The key idea is to assign new participants with greater probability to treatment arms that have had, to that point, more positive outcomes than to other treatment arms.

RAR is sometimes valued for advantages to trial participants such as exposure of fewer participants to an inferior treatment or reduction in the expected number of participant treatment failures in a trial with a binary response endpoint. However, RAR procedures also bring challenges in ensuring valid statistical inference. Perhaps most concerning, RAR designs are susceptible to bias and inflation of the Type I error probability in the presence of overall time trends. For example, a RAR design would more likely show a false positive treatment effect if earlier-enrolled participants are both more likely to be assigned to control and to have a poor prognosis (e.g., because of changes in background care or participant characteristics over time) than later-enrolled participants. In addition, the use of efficacy-based algorithmic modifications to the randomization scheme could lead to an insufficient sample size to support decision-making on a treatment that may have lesser efficacy but a better benefit-risk profile. Any proposal to use RAR should address these potential issues. The specific RAR procedure should be pre-specified and justified. There should be careful specification of analysis methods that provide Type I error probability control and reliable estimates. The proposal should address the potential for confounding due to time trends. The degree of such confounding may depend on factors such as the expected duration of the trial and the likelihood of changes in background care or prognostic factors over time (e.g., such changes may be likely in a rapidly evolving infectious disease setting). One approach that controls the Type I error probability is to allow randomization ratio adaptation at only a single or small number of interim analyses, while utilizing adaptive hypothesis testing based on pre-specified weights for combining the information across trial stages. Time trends may also be addressed by using specific methodology (e.g. re-randomization tests), but an RAR design using such tests might be less powerful than a design with a fixed randomization scheme.

An approach that implements the changes to the randomization scheme over time without sponsor involvement should be planned to reduce the risk to trial integrity. Given that knowledge of the RAR procedure and the adaptively selected randomization ratio could reveal information about the interim treatment effect estimate, steps should be taken to minimize what can be inferred from the adaptations (Section 3.5). Finally, there can be additional challenges such as ensuring the timely availability of high-quality interim data on an ongoing basis and integrating the algorithm into the randomization system.

There are also non-randomized, deterministic adaptations to participant allocation such as in a two-arm trial where a response results in assigning the next participant to the same treatment, while a non-response leads to assigning the next participant to the alternative treatment. Such deterministic procedures are discouraged (ICH E9) due to the high risk of bias and the potential for predicting the next treatment allocation.

5. SPECIAL TOPICS AND CONSIDERATIONS

This section expands on some special topics for adaptive designs, including data monitoring, simulations, use of Bayesian methods, time-to-event endpoints, exploratory trials, and operational execution.

5.1 Further Considerations on Data Monitoring

This section discusses further considerations related to data monitoring in confirmatory trials with adaptive designs that include interim analyses based on accumulating unblinded data. An IDMC for a trial with an adaptive design should contain, as a group, all expertise needed for making adaptation recommendations in addition to meeting its usual responsibilities (i.e., protecting individual participants’ safety while maintaining trial integrity). It should include at least one statistician knowledgeable and experienced in interim monitoring and in statistical methodologies relevant to the proposed adaptive design and analysis. The IDMC should generally have access to unblinded efficacy and safety data. Operational aspects should be outlined in a designated charter to document details such as content and frequency of reports to be prepared, meeting schedule and logistics, procedures to maintain confidentiality, statistical aspects of the monitoring plan, and processes for making recommendations. It is important that sponsors align upfront with the IDMC on the trial objectives and design, expectations for the IDMC (including those that go beyond the usual responsibilities), type and implications of adaptations, and anticipated adaptation rules.

An independent statistical group that conducts analyses of accumulating unblinded data and produces interim reports for the IDMC should be in place. It should not include members of the monitoring committee and should not support other trial activities. Trial integrity will be best protected when this statistical group having access to unblinded data is external to and independent from the sponsor. The statisticians and programmers that comprise this group should have the appropriate expertise to carry out the analyses needed to implement the adaptive design and to support the IDMC. They should have access to all trial data needed to carry out their responsibilities. It is strongly recommended that the independent statistical group and IDMC have sole access to unblinded interim data and results. Appropriate processes for maintaining confidentiality (e.g., standard operating procedures and confidentiality agreements) should be in place.

Upon reviewing the unblinded interim results, the IDMC should provide adaptation recommendations to designated sponsor personnel separated from the trial team. In the specific case that the IDMC has made a recommendation to stop a trial early, sufficient information may then be communicated to the sponsor (e.g., key efficacy and safety results) to allow sponsor decision-making about whether to stop the trial. In general, however, the adaptations should be planned such that the sponsor can implement the IDMC recommendations regarding trial adaptations without having access to any unblinded interim results. For example, this would be the case when the IDMC recommends continuing the trial in a group sequential design or when it selects a specific sample size in a sample size adaptation design. This requires extensive planning and discussion between the sponsor and the IDMC at the planning stage to ensure a common understanding of the monitoring processes and anticipated adaptation rule.

Risks to trial integrity are most easily minimized by completely restricting sponsor access to unblinded interim results. However, sponsors may propose some degree of access to unblinded data in certain circumstances. This should be made explicit at the planning stage. Any proposal for sponsor access needs to be supported by a compelling rationale. In this case, there also should be planned steps to protect trial integrity such as minimizing the number of individuals with access, ensuring individuals with access are independent from those involved in trial conduct, and implementing processes to maintain confidentiality. All information regarding who accessed what data should be recorded in detail so that regulators assessing trial results before and after the adaptation can be reassured at the end of the trial that trial integrity was not compromised.

5.2 Planning, Conducting, and Reporting Simulation Studies

Simulation studies often play an important role in the planning of a trial with an adaptive design. A simulation is the repeated execution of a large number of hypothetical clinical trials to understand operating characteristics of a trial design under a series of specific configurations of assumptions (scenarios). Simulations can be used to investigate operating characteristics of a proposed adaptive design in different scenarios, such as under different treatment effect and nuisance parameter assumptions, in the presence of varying dropout or enrollment rates, or with a specific sample size when analytical properties of an analysis approach rely on large sample sizes. For example, the probability of a false positive conclusion can be estimated by calculating the proportion of simulated clinical trials that would lead to a false positive conclusion that a treatment is effective when data have been simulated under the assumption of no beneficial treatment effect. Simulations can facilitate comparisons of adaptive and non-adaptive designs, comparisons of different adaptive design options, and comparisons of different drug development programs (i.e., a comparison of a sequence of trials). Simulations can also inform internal sponsor decision-making on trial logistics such as site selection and drug supply. This section focuses on principles for the appropriate planning, conduct, and reporting of simulations when they are critical for understanding the operating characteristics of a trial with an adaptive design.

It is important to clearly define and focus on the key objectives the simulation study is designed to address. These should be specific, relevant, and directly related to the decisions that will be made as a result of the simulation study. To address the objectives, a range of clinical trial designs and analysis options should be carefully selected. These should include a benchmark design and analysis approach, i.e., a design with well-understood operating characteristics such as a non-adaptive or group sequential design. This range of designs may also include, for example, different choices for the number and timing of interim analyses, stagewise sample sizes, types of adaptations, stopping and adaptation rules, and statistical methods for testing and estimation. The choice of design options may be an iterative process as operating characteristics are explored and should be sufficiently broad to allow a comprehensive assessment of the selected adaptive design. The evaluation of the advantages and limitations of all design options included in the simulation study is critical to understand the tradeoffs in the selection of the proposed design.

It is also important to define and assess key operating characteristics that align with the questions the simulation study is designed to address. These operating characteristics should generally include the Type I error probability, expected sample size, expected trial duration, power, coverage of confidence intervals, and bias and mean squared error of treatment effect estimates. Other operating characteristics such as the probability of stopping for futility or efficacy at an interim analysis may also be of interest, depending on the trial design and setting. Considerations around operating characteristics for adaptive designs using Bayesian methods are discussed in Section 5.3. Sometimes, operating characteristics beyond a single trial may be of interest, such as the probability of selecting an appropriate dose and subsequently confirming its efficacy. While it is relevant to summarize the average of the results across the simulated trials (repetitions), it may also be important to evaluate the variability, minimum and maximum, or other aspects of the distribution of results (e.g., the sample size distribution in a trial with the potential for early stopping or sample size adaptation).

The scenarios included in the simulation study should cover the plausible range of assumptions to ensure a robust assessment of the performance of the proposed adaptive design. This includes assumptions about the treatment effects and nuisance parameters, such as the standard deviation for a continuous outcome, and operational assumptions for which a sponsor may have greater control (e.g., enrollment or dropout rates). The adequacy of the assumptions should be justified based on clinical and statistical considerations, with documentation of the supporting knowledge. This information can come from a variety of sources, including data from previous trials, publications, results from extrapolations, and expert input. All relevant sources of information available to the sponsor should be used, and attempts should be made to quantify uncertainty and identify potential biases. Using a grid of assumptions (e.g., discrete set of assumptions across a specific range) should be supported by justification based on existing clinical knowledge that the range evaluated in the grid covers all plausible scenarios. It is also important to justify (e.g., based on monotonicity arguments) that the grid is fine enough (i.e., that a sufficient number of different assumptions are included within the range) to provide a reliable estimate of the operating characteristics of interest. Sources of information based on robust evidence and understandable from a clinical perspective will make the simulation study results more interpretable and convincing.

It is essential that the simulated scenarios comprehensively cover the plausible range of nuisance parameter configurations. For example, in using simulations to investigate the Type I error probability, it is impossible to simulate under every nuisance parameter configuration consistent with no beneficial treatment effect, even in the simplest trial designs. Thus, there is additional uncertainty for designs in which simulations are critical to understand the Type I error probability. Given the additional uncertainty, additional justification is expected to support such designs.

Implementation details of the simulation study should be described and justified. This includes clear specification of the data-generating process. In many cases, a simple statistical model, such as a normal distribution with mean and variance obtained from previous trials, may be appropriate. In other cases, a more complex model fit based on earlier trial results (e.g., longitudinal patient profiles) may be considered. This also includes determining the number of repetitions needed to get sufficient precision in the estimation of important operating characteristics. More precision may be needed for certain operating characteristics or scenarios. For example, it may be important to use 100,000 or more repetitions per scenario to ensure sufficient precision for estimating the Type I error probability, whereas fewer repetitions may suffice for other operating characteristics such as power. Algorithms should be documented and random numbers should be generated in a reproducible way, such as using a documented seed.

Finally, it is important to document the design, results, and conclusions of the simulation study. A comprehensive and structured report of the simulations should be included in regulatory submissions prior to conducting the trial (Section 6.1). There should be explicit links between clinical and statistical assumptions and results of the simulations. The report should align with the considerations outlined in this section and include the following:

Key questions the simulation study is designed to address.
The clinical trial design and analysis options evaluated in the simulation study.
The choice of operating characteristics assessed in the simulation study.
Existing knowledge, and any supporting data or references, to inform the simulation scenarios.
The set of parameter configurations used for the simulation scenarios, along with a clinical justification based on existing knowledge that the set adequately covers the plausible range of values for the different parameters.
Implementation details, including the data-generating process and the number of repetitions for each scenario, along with justifications for these choices.
Software package used for simulations and, if custom software was used, the simulation code. When code is provided, it should have adequate comments with detailed instructions on how to execute the code (e.g., an example call and the starting seed).
A summary providing overall results, interpretations, and conclusions. This should include a detailed discussion of the proposed adaptive design and its estimated operating characteristics under the various scenarios. Summarizing results in interactive graphs, where possible, can help make the results more accessible.
A description of relevant examples of single simulated clinical trials with different adaptations and conclusions. For example, in a design with sample size adaptation, this might include trials with different sample size modifications at the interim analysis and with positive or negative primary analysis results to facilitate a better understanding of potential interim decisions and their impact on the trial results.
A description of any aspects that limit the interpretation of the simulation results (e.g., uncertainty in assumptions or extrapolations).
A clinical discussion about if and to what extent the simulation results address the key questions.

The careful documentation of simulation studies is also critical because the validity of the simulations and associated conclusions will be part of the regulatory review of results at the end of the trial.

5.3 Adaptive Designs Using Bayesian Methods¹

1

This section on Bayesian methods for adaptive designs is not fully harmonized. The broad use of Bayesian methods may not be justified in all situations for regulatory decision-making. As noted in ICH E9 and in this draft guideline, the use of Bayesian methods in clinical trials may be considered when the reasons for their use are clear and when the resulting conclusions are sufficiently robust. Public consultation comments are sought on the topic, and on situations in which Bayesian methods satisfy the core adaptive design principles, and in which the use of Bayesian methods could be considered.

ICH E9 notes that the use of Bayesian methods in clinical trials may be considered when the reasons for their use are clear and when the resulting conclusions are sufficiently robust. Bayesian methods refer to a wide range of statistical approaches that combine a prior probability distribution with current trial data to obtain a posterior probability distribution for a quantity of interest (e.g., the treatment effect or estimand). Bayesian methods are potentially applicable to a variety of adaptive designs. The principles outlined in Section 3 should be followed regardless of the specific statistical approach. There are different types of application of Bayesian methods to clinical trials with an adaptive design, each with different considerations.

Bayesian methods can be used to inform adaptations in a trial where decision criteria for the primary analysis are chosen to ensure that the Type I error probability is controlled. For example, a trial might include interim analyses with pre-specified non-binding futility stopping rules based on a scale such as the posterior probability that the treatment is inefficacious or the predictive probability of rejecting the null hypothesis at trial end, where the primary efficacy analysis is performed with a frequentist hypothesis test at a pre-specified significance level. For such designs, expectations for operating characteristics are the same as for adaptive designs that do not involve Bayesian methods. Sponsors should justify that the prior distribution, decision criteria, and adaptive design elements (e.g., number and timing of interim analyses and adaptation rules) can achieve targeted operating characteristics (e.g., power, expected sample size, reliability of adaptation decisions) while maintaining Type I error probability control.

A special case is the use of adaptive design elements in the context of clinical trials that use Bayesian methods to borrow external information based on an informative prior distribution, with decision criteria for the primary analysis based on the posterior distribution for the estimand of interest (i.e., a threshold on the posterior probability for efficacy). Borrowing of external data to inform inference requires a thorough scientific justification that addresses the feasibility of alternative approaches not involving borrowing (e.g., design and conduct of a fully powered trial without using external data) and supports the relevance and quality of the external data. Misspecification of the prior distribution can lead to lack of control of the probability of false positive conclusions. Ensuring that a prior accurately reflects relevant available information and addressing the potential for conflict between prior and current trial data introduce additional uncertainties that are not present when using frequentist inference with no borrowing.

For such designs, sponsors should discuss and document in the protocol the source of the external information used to generate the prior, the relevance of the external information to the trial design (e.g., whether the populations and concomitant care are sufficiently similar, and the endpoints are the same), the list of all potentially relevant sources of information, and why selected information sources were used and other potentially relevant sources were discarded. Input from clinical subject matter experts is crucial for evaluating the relevance of external information. When considering the source of external information, data from randomized controlled trials and recent data are generally preferred. Patient-level data are generally expected because they allow a thorough evaluation at the planning stage of the relevance of the external information and may facilitate strategies to address potential conflict between the prior and current trial data at the assessment stage.

Sponsors should pre-specify and justify the details of a proposed prior distribution, including the amount of borrowing from the external data, as well as the criteria for defining trial success. The prior and decision criteria should ensure the design fulfills the principles in Section 3.3, including control of the chances of false positive conclusions. The justification for the prior should include a discussion of the balance between the prior and trial data and strategies to mitigate the risk that observed trial data may conflict with the prior. There should be a sufficient amount of current trial data to support benefit-risk assessment. Simulations should be performed to evaluate the chances of erroneous conclusions, including the chances of false positive conclusions, under various scenarios of prior-data conflict. There should be a discussion at the planning stage about the maximum amount of borrowing and the relationship between observed conflict and the degree of borrowing, including circumstances that would question the relevance of the external data and lead to no borrowing. Sensitivity analyses should also be planned to investigate the robustness of the trial conclusions against alternative reasonable choices for the prior distribution. It is also important to evaluate the current trial data with no borrowing.

5.4 Adaptive Designs in Time-to-Event Settings

There are additional considerations specific to trials in which the primary endpoint is the time to occurrence of a certain event. In such time-to-event trials, the statistical power of the trial depends on the number of events rather than the number of participants. It is therefore common for such trials to target a fixed number of events when calculating the sample size at the trial planning stage. In addition, the follow-up time of participants is often unspecified, meaning that the trial does not have a fixed observation period, and all participants are followed until a certain number of events have occurred. For trials with adaptive designs in time-to-event settings, interim analyses are therefore often planned at target numbers of events rather than target sample sizes. Furthermore, a sample size adaptation based on an interim treatment effect estimate in a time-to-event trial may entail modification of the initially planned number of events. For example, targeting a larger number of events than originally planned could be achieved by simply waiting longer for events to occur (i.e., allowing for longer follow-up times) with the originally planned number of trial participants. Alternatively, the number of trial participants could be increased, or both approaches could be applied. In considering increases in the number of trial participants relative to the number of events, sponsors should ensure that sufficient data will be available for the benefit-risk assessment (e.g., to understand longer term treatment effects and to evaluate relevant subgroups of the patient population, including those with lower background risk of the event).

Adaptive designs are most straightforward when each trial participant only takes part in one stage of the trial. If the data collected prior to an interim analysis are completely independent of the data collected afterwards, a statistical analysis combining all information can proceed in a relatively simple way. In a time-to-event setting, however, some trial participants may be enrolled and remain event-free in one stage, but may contribute an event in a later stage. Using information (e.g., on secondary endpoints) from participants who have been enrolled in the trial but not yet experienced the event of interest at an interim analysis to inform potential adaptations violates the independence assumption and can inflate the Type I error probability (even when using adaptive test statistics). Therefore, it is important to define plans with specific methodology for maintaining the Type I error probability. One option is to fully pre-specify an adaptation rule that relies on only the primary endpoint, without the possibility of deviations from such a rule. Another option is to use special methods that involve defining stages based on the sets of participants enrolled before and after the interim analysis, while also setting in advance either a fixed follow-up time or a fixed number of events for each stage. Alternatively, rather than incorporating adaptations to the number of events, sponsors can consider a design that targets a larger number of events and includes the option to stop the trial early at an interim analysis. Similar conceptual problems and respective considerations also apply to adaptive designs with longitudinal outcomes, as using surrogate or intermediate outcome information on participants who have not completed all follow-up visits at the interim analysis can increase the Type I error probability unless appropriate analysis methods are used.

5.5 Adaptive Designs in Exploratory Trials

This guideline focuses on the use of adaptive designs in confirmatory clinical trials. If a trial may be intended to confirm efficacy and support benefit-risk assessment, it is critical that the principles in Section 3 are followed. Adaptive designs may also be used in exploratory trials early in drug development that are intended to obtain information on a wide range of aspects of treatment use (e.g., choices of dose, regimen, population, endpoints). Trials at this stage of the development program may include a larger number of adaptations to generate information that support important decisions about subsequent development phases. The principles in this guideline are also relevant in these settings to ensure the reliability and interpretability of the results and subsequent decision-making based on such trials.

Additional considerations may apply, however, for exploratory trials because independent confirmation of findings will usually follow in one or more separate trials. For example, it may be sufficient that the protocol describes general principles for trial adaptations rather than the specific adaptation rule. This may be appropriate in, for example, dose-escalation trials where model-based dose recommendations are to be considered in the context of other emerging information (e.g., about toxicities that do not qualify for a dose-limiting toxicity). In addition, it is critical that exploratory trials with an adaptive design can reliably inform the decisions they are intended to support. For example, providing a convincing basis for decision-making about the appropriate target dose to be investigated in a confirmatory trial is critical as a suboptimal conclusion can have serious consequences for the subsequent development program. Maintaining the integrity of exploratory trials with an adaptive design is also important, but there may be additional considerations for the sponsor’s role in interim decision-making. For example, monitoring of an adaptive dose-ranging trial intended to inform the adequate dose for subsequent confirmatory trials may entail multidimensional adaptation decisions that require considerable input from various disciplines within the sponsor. Sponsors should then take into account the questions a trial intends to answer and its position within the development program, as well as the tradeoffs for sponsor involvement in the monitoring process versus limitation of access to unblinded results to maintain trial integrity. Any monitoring plan should ensure the protection of trial participants’ safety.

5.6 Operational Considerations

Use of an adaptive design can add challenges to the operational execution of a clinical trial and these should be addressed at the trial planning stage. For example, measures should be implemented to minimize the information that can be inferred from an interim analysis to maintain trial integrity (Section 3.5). As another example, informed consent forms should cover the possibility of adaptive changes in the trial. Participants should understand the reasons for such changes (e.g., the goal of selecting the dose with the best benefit-risk profile from among multiple doses at an interim analysis), that these changes reflect improved knowledge about the treatment under investigation, and that their rights and safety remain protected. As yet another example, the infrastructure needed for trials with an adaptive design, such as data management systems, may differ from that of trials with a non-adaptive design. Clinical trials with an adaptive design typically use an interactive voice or web randomization system to manage randomization and assignment of participants to treatment arms. Such systems should be fully integrated into clinical trial operational processes and drug supply chain mechanisms. Pre-specified algorithms should be built into the system to ensure it is capable of handling the foreseeable scenarios (e.g., a change in the treatment arms or randomization ratio) with minimum sponsor involvement. Also, adaptations to the sample size, treatment arms, or participant allocation can lead to drug supply challenges. One such challenge is lead times for manufacturing drugs, as rapid adaptations can strain drug supply chains and lead to delays in participant treatment if sufficient drug supply is not readily available. These challenges may be increased when a clinical trial with an adaptive design spans multiple countries or even regions, as drugs need to be distributed to these locations in a timely manner. Simulations may help support supply-related decisions at planning and execution stages of the trial. Finally, processes should be established at the planning stage to ensure relevant interim data can be appropriately validated and cleaned in a timely manner to ensure quality interim data informing the adaptation decision. This may include requiring a formal interim database lock to ensure completion of data validation and cleaning activities.

6. DOCUMENTATION

6.1 Documentation Prior to Conducting a Confirmatory Trial with an Adaptive Design

Documentation is a critical part of adequate planning of a confirmatory trial and allows a rigorous evaluation of the proposed adaptive design. In addition to the information typically included in a clinical trial protocol or in other documents, where suitable, documentation should include the following:

A rationale for the proposed adaptive design: The rationale should include both clinical and statistical considerations, justifying the proposal to adapt in a confirmatory trial and the adequacy of the proposed trial design within the clinical development program. A discussion of advantages and limitations as compared to alternative designs (including non-adaptive designs) will help regulators evaluate the acceptability of any additional uncertainty attributable to proposed adaptive elements.
A description of the adaptations being proposed: This should include the aspects of the trial that may be modified, the number and timing of interim analyses, and the anticipated rule governing the adaptation decision (e.g., the formula for determining the target sample size as a function of the interim treatment effect estimate, including the minimum and maximum potential sample size, in a design with sample size adaptation). If the design involves selection of an estimand at an interim analysis (e.g., through treatment or population selection), this should include precise definitions of all candidate estimands.
A description of the statistical analysis methods: This should include the methods for producing interim results and guiding adaptations decisions, the statistical approach for primary and secondary analyses (e.g., for hypothesis testing and for estimating treatment effects and corresponding measures of uncertainty), and important sensitivity and supplementary analyses.
A description of how the adaptive design will be implemented: This should include who will carry out interim analyses; who will be responsible for reviewing interim analysis results and making adaptation recommendations and/or decisions; and membership, roles, responsibilities, and operational aspects of any relevant committees.
A description of steps to maintain confidentiality of interim results and protect trial integrity, among other details of the operational execution: This should include processes for information transfer and access; who will have access to unblinded interim results; how access to unblinded interim results will be controlled, what type of information will be disseminated following adaptive decisions, from whom, and to whom; and where records about information access and dissemination will be saved.
A description of important operating characteristics of the design. In cases where simulations are critical for understanding operating characteristics, this should include a report that describes the objective, design, implementation, and results of the simulation study (Section 5.2).

This information should be documented and included in regulatory submissions prior to initiation of the trial, in accordance with applicable national and regional regulatory requirements and practices. The protocol should contain the core elements, including the trial objectives and corresponding estimand(s), and the principal features of the trial design, conduct, and statistical analysis, including all adaptive design elements and their rationale. Some information, such as details on operation of an IDMC and data access processes, may instead be included in a separate document such as an IDMC charter. In some cases, details of the anticipated adaptation rule should be reserved for specific documents with access restrictions, rather than the protocol, to maintain trial integrity (Section 3.5).

6.2 Documentation to Include in a Marketing Application After a Completed Confirmatory Trial with an Adaptive Design

A marketing application for a treatment that relies on a confirmatory clinical trial with an adaptive design should include sufficient documentation to allow a comprehensive review of the trial results. In addition to its typical components, a marketing application should include:

All prospective plans described in Section 6.1.
Information on how the adaptive design was implemented, including the actual number and timing of interim analyses, an evaluation of whether aspects of trial conduct (e.g., baseline characteristics, enrollment rate, adherence, retention) varied notably before and after the interim analysis, the results of interim analyses used for adaptation decisions, any notable heterogeneity between results from different stages of the trial, the adaptation decisions that were made, whether anticipated adaptation rules were followed, and the date of sponsor unblinding. If there was any deviation from the anticipated plan (e.g., in terms of the number or timing of interim analyses or adherence to the anticipated adaptation rule), this should include a discussion of the reasons for the deviation, any measures taken to minimize impact on trial integrity, and any other potential impact on the interpretation of trial results.
Any information on compliance with planned processes for data access and maintaining trial integrity, such as results of any audits and reporting of any known deviations from the processes, along with a discussion of potential implications.
Records of deliberations by the IDMC (e.g., all closed and open IDMC meeting minutes), including records of discussions related to any adaptation decisions.
Reporting of results that appropriately account for the adaptive design (e.g., appropriately adjusted estimates, confidence intervals, and p-values).