Skip to main content

Example of dataset format (example ONLY; not actual data)

field / column description
[TYPE]
# or range of non-redundant values:
sort_population indicates population of single domain library where
Diversity is inserted
[STRING]
  • phase1_h1_h2_am: 5,959 unique VL+VH
  • phase1_l1_l2_am: 23,320  unique VL+VH
  • phase1_l3_am: 3,589 unique VL+VH
sequence_aa_heavy amino acid
[STRING]
  • phase1_h1_h2_am: 31,729
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
sequence_aa_light amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 122,988
  • phase1_l3_am: 30,379
cdr1_aa_heavy
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 2,528
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
cdr2_aa_heavy
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1,023
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
cdr3_aa_heavy
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
cdr1_aa_light
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 4,770
  • phase1_l3_am: 826
cdr2_aa_light
[KABAT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 7,047
  • phase1_l3_am: 784
cdr3_aa_light
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 996
  • phase1_l3_am: 5,585
redundancy # of times full chain appears in NGS at the amino acid level
[INTEGER]
  • phase1_h1_h2_am: [1 – 12,099] [min – max]
  • phase1_l1_l2_am: [1 – 494] [min – max]
  • phase1_l3_am: 1,541  [1 – 968] [min – max]

Example of dataset format (example ONLY; not actual data)

field / column description
[TYPE]
# or range of non-redundant values:
sort_population indicates population of single domain library where
Diversity is inserted
[STRING]
  • phase1_h1_h2_am: 5,959 unique VL+VH
  • phase1_l1_l2_am: 23,320  unique VL+VH
  • phase1_l3_am: 3,589 unique VL+VH
sequence_aa_heavy amino acid
[STRING]
  • phase1_h1_h2_am: 31,729
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
sequence_aa_light amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 122,988
  • phase1_l3_am: 30,379
cdr1_aa_heavy
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 2,528
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
cdr2_aa_heavy
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1,023
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
cdr3_aa_heavy
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 1
  • phase1_l3_am: 1
cdr1_aa_light
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 4,770
  • phase1_l3_am: 826
cdr2_aa_light
[KABAT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 7,047
  • phase1_l3_am: 784
cdr3_aa_light
[IMGT]
amino acid
[STRING]
  • phase1_h1_h2_am: 1
  • phase1_l1_l2_am: 996
  • phase1_l3_am: 5,585
redundancy # of times full chain appears in NGS at the amino acid level
[INTEGER]
  • phase1_h1_h2_am: [1 – 12,099] [min – max]
  • phase1_l1_l2_am: [1 – 494] [min – max]
  • phase1_l3_am: 1,541  [1 – 968] [min – max]

Example of dataset format (example ONLY; not actual data) – This is for challenge 2 AND challenge 3

 

field / column description

[TYPE]

# or range of non-redundant values:
characterized TRUE if characterized by SPR

[BOOL]

TRUE

  • # VL+VH: 142
  • #L3+H3: 74
  • #H3: 55
  • #clusters: 40
FALSE

  • # VL+VH: 31,917
  • #L3+H3: 4,362
  • #H3: 754
  • #H3 clusters: 300
lsa_bin experimentally determined

bin group

[INTEGER]

  • values = [1, 2, 5, NA]
cluster_cdr3_heavy unique identifier (e.g., 57F)

[STRING]

  • 300
affinity LSA affinity in molarity (M)

[FLOAT]

  • 2.7×10-11 –  2.1×10-3 M
  • NA = uncharacterized
  • 1.0×10-6 M = characterized, weak affinity
on_rate LSA on-rate in (M-1s-1)

[FLOAT]

  • 1.2×102 – 4.5×105 M-1s-1
  • NA = uncharacterized OR characterized, slow on-rate
off_rate LSA off-rate in (s-1)

[FLOAT]

  • 1.0×10-5 – 0.6×10-2 s-1
  • NA = uncharacterized OR characterized, fast off-rate
sequence_aa_light amino acid

[STRING]

  • 102aa to 120aa
sequence_aa_heavy amino acid

[STRING]

  • 113aa to 133aa
cdr1_aa_heavy

[IMGT]

amino acid

[STRING]

  • 7aa to 14aa
cdr2_aa_heavy

[IMGT]

amino acid

[STRING]

  • 6aa to 9aa
cdr3_aa_heavy

[IMGT]

amino acid

[STRING]

  • 6aa to 26aa
cdr1_aa_light

[IMGT]

amino acid

[STRING]

  • 5aa to 12aa
cdr2_aa_light

[KABAT]

amino acid

[STRING]

  • 6aa to 11aa
cdr3_aa_light

[IMGT]

amino acid

[STRING]

  • 5aa to 18aa
relative_abundance_10nM relative abundance of the concatenated CDRs in the 10nM RBD sort round via NGS

[FLOAT]

  • 0.0% – 6.1%
relative_abundance_1nM relative abundance of the concatenated CDRs in the 1nM RBD sort round via NGS

[FLOAT]

  • 0.0% – 9.5%

Understanding AI in Antibody Discovery -

Understanding AI in Antibody Discovery -

Understanding AI in Antibody Discovery -

Understanding AI in Antibody Discovery -

Understanding AI in Antibody Discovery -

Understanding AI in Antibody Discovery -