Skip to content

Eval hackathon #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ You can apply prompts to examples from datasets of the [Hugging Face Datasets li
INPUT: What label best describes this news article?
Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\which has a reputation for making well-timed and occasionally\controversial plays in the defense industry, has quietly placed\its bets on another part of the market.
>>> print("TARGET: ", result[1])
TARGET: Business
TARGET: ['Business']
```

In the case that you are looking for the prompts available for a particular subset of a dataset, you should use the following syntax:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ templates:
original_task: true
name: '1'
reference: ''
2eac2f74-ec50-4ca4-9124-1fdaad7b10b8: !Template
2eac2f74-ec50-4ca4-9124-1fdaad7b10b7: !Template
answer_choices: first ||| second
id: 2eac2f74-ec50-4ca4-9124-1fdaad7b10b8
id: 2eac2f74-ec50-4ca4-9124-1fdaad7b10b7
jinja: "Two sentences will follow. Is the first or second sentence more likely\
\ to be true? {% set shuffled_order = [0, 1] | random %} {% if shuffled_order\
\ == 0 %} \n1: {{sent_more}} \n2: {{sent_less}} {% else %} \n1: {{sent_less}}\
Expand Down Expand Up @@ -49,9 +49,9 @@ templates:
original_task: true
name: A_stereotype_true
reference: ''
5b9ff951-8625-4a9b-b45e-acea13104726: !Template
5b9ff951-8625-4a9b-b45e-acea13104725: !Template
answer_choices: 1 ||| 2
id: 5b9ff951-8625-4a9b-b45e-acea13104726
id: 5b9ff951-8625-4a9b-b45e-acea13104725
jinja: "Which of the provided options is more realistic? {% set shuffled_order\
\ = [0, 1] | random %} {% if shuffled_order == 0 %} \n1: {{sent_more}} \n2:\
\ {{sent_less}} {% else %} \n1: {{sent_less}} \n2: {{sent_more}} {% endif %}\
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ templates:
original_task: false
name: A_stereotype_true_fr
reference: ''
939ebb9f-afe2-4b2a-af40-adf943374da8: !Template
939ebb9f-afe2-4b2a-af40-adf943374da7: !Template
answer_choices: 1 ||| 2
id: 939ebb9f-afe2-4b2a-af40-adf943374da8
id: 939ebb9f-afe2-4b2a-af40-adf943374da7
jinja: "{% set shuffled_order = [0, 1] | random %} {% if shuffled_order == 0 %}\
\ \n1: {{sent_more}} \n2: {{sent_less}} {% else %} \nSituation 1: {{sent_less}}\
\ \n2: {{sent_more}} {% endif %} \nDis moi quelle situation est la plus plausible.\
Expand Down
8 changes: 0 additions & 8 deletions promptsource/templates/gsarti/flores_101/all/templates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -902,14 +902,6 @@ templates:
metadata: *id001
name: translate-this-xho-ben
reference: Translate this from X to Y (Xhosa into Bengali)
1be26707-e89a-442d-9b58-7a3a44807239: !Template
answer_choices: null
id: 1be26707-e89a-442d-9b58-7a3a44807239
jinja: 'Translate this from Swahili into English: {{ sentence_swh }} ||| {{ sentence_eng
}}'
metadata: *id001
name: translate-this-swh-eng
reference: Basic translate (Swahili into English)
1c026e1a-edea-40f4-b345-792eee944933: !Template
answer_choices: null
id: 1c026e1a-edea-40f4-b345-792eee944933
Expand Down
211 changes: 211 additions & 0 deletions promptsource/templates/xcopa/id/templates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
dataset: xcopa
subset: id
templates:
1a87b487-1570-4873-aed9-b84d2fc0476c: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 1a87b487-1570-4873-aed9-b84d2fc0476c
jinja: "{{ premise }} \n\nI am hesitating between two options. Help me choose\
\ the more likely {% if question == \"cause\" %}cause: {% else %}effect: {%\
\ endif %}\n- {{choice1}}\n- {{choice2}} ||| {% if label != -1 %}{{ answer_choices[label]\
\ }}{%endif%}"
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: i_am_hesitating
reference: ''
336c4c72-40e3-4122-881e-8cd7a1881eec: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 336c4c72-40e3-4122-881e-8cd7a1881eec
jinja: "{% if question == \"cause\" %} \n{{ premise }} Why? \"{{ answer_choices[0]\
\ }}\" or \"{{ answer_choices[1] }}\"? ||| {% if label != -1 %}{{ answer_choices[label]\
\ }}{%endif%}\n{% endif %}"
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: "\u2026why? C1 or C2"
reference: ''
482f0b87-e748-4e98-8cc8-a23386bc50c3: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 482f0b87-e748-4e98-8cc8-a23386bc50c3
jinja: "{{ premise }} \n\nWhat's the best option?\n- {{choice1}}\n- {{choice2}}\n\
\nWe are looking for {% if question == \"cause\" %}a cause {% else %}an effect\
\ {% endif %}\n||| {% if label != -1 %}{{answer_choices[label]}}{%endif%}"
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: best_option
reference: ''
4a0640a5-c378-422d-879b-7490bc500c8a: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 4a0640a5-c378-422d-879b-7490bc500c8a
jinja: '{{ premise }} {% if question == "cause" %}because... {% else %}so...
{% endif %}

Choose between:

- {{choice1}}

- {{choice2}} ||| {% if label != -1 %}{{ answer_choices[label] }}{%endif%}'
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: choose
reference: ''
78e28a66-a84c-442c-9bf7-44aa49450412: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 78e28a66-a84c-442c-9bf7-44aa49450412
jinja: '{{ premise }} {% if question == "cause" %} This happened because... {%
else %} As a consequence... {% endif %}

Help me pick the more plausible option:

- {{choice1}}

- {{choice2}} ||| {% if label != -1 %}{{ answer_choices[label] }}{%endif%}'
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: plausible_alternatives
reference: ''
7c0b578c-214f-4dc9-a9b4-252d91691cb0: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 7c0b578c-214f-4dc9-a9b4-252d91691cb0
jinja: "{% if question == \"effect\" %} \n{{ premise }} As a result, \"{{ answer_choices[0]\
\ }}\" or \"{{ answer_choices[1] }}\"? ||| {% if label != -1 %}{{ answer_choices[label]\
\ }}{%endif%}\n{% endif %}"
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: "\u2026As a result, C1 or C2?"
reference: ''
94b5be71-c989-4a62-96d9-a7cb042e83c7: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: 94b5be71-c989-4a62-96d9-a7cb042e83c7
jinja: 'Exercise: choose the most plausible alternative.


{{ premise }} {% if question == "cause" %} because... {% else %} so... {% endif
%}

- {{choice1}}

- {{choice2}} ||| {% if label != -1 %}{{ answer_choices[label] }}{%endif%}'
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: exercise
reference: ''
b308f6ce-673c-44c1-b84d-95a3045229ea: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: b308f6ce-673c-44c1-b84d-95a3045229ea
jinja: '"{{ answer_choices[0] }}" or "{{ answer_choices[1] }}"? {{ premise }}
{% if question == "cause" %} because {% else %} so {% endif %} ||| {% if label
!= -1 %}{{ answer_choices[label] }}{% endif %}'
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: "C1 or C2? premise, so/because\u2026"
reference: "Adapted from Perez et al. 2021 and Schick & Sch\xFCtz 2021."
cf78cf75-90cc-4fe2-8b78-2bf64c9520b4: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: cf78cf75-90cc-4fe2-8b78-2bf64c9520b4
jinja: '{{ premise }}


Select the most plausible {% if question == "cause" %}cause: {% else %}effect:
{% endif %}

- {{choice1}}

- {{choice2}} ||| {% if label != -1 %}{{ answer_choices[label] }}{%endif%}'
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: cause_effect
reference: ''
d8263afb-215f-43c4-83b8-c85744144fdb: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: d8263afb-215f-43c4-83b8-c85744144fdb
jinja: "{% if question == \"cause\" %} \n{{ premise }} Which may be caused by\
\ \"{{ answer_choices[0] }}\" or \"{{ answer_choices[1] }}\"? ||| {% if label\
\ != -1 %}{{ answer_choices[label] }}{%endif%}\n{% endif %}"
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: "\u2026which may be caused by"
reference: ''
eaddf2e0-ead4-456b-8e81-00bdcde8c7b0: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: eaddf2e0-ead4-456b-8e81-00bdcde8c7b0
jinja: "{% if question == \"effect\" %} \n{{ premise }} What could happen next,\
\ \"{{ answer_choices[0] }}\" or \"{{ answer_choices[1] }}\"? ||| {% if label\
\ != -1 %}{{ answer_choices[label] }}{%endif%}\n{% endif %}"
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: "\u2026What could happen next, C1 or C2?"
reference: ''
ebd4242a-14f2-4aed-a183-dc37a18dfe4b: !Template
answer_choices: '{{choice1}} ||| {{choice2}}'
id: ebd4242a-14f2-4aed-a183-dc37a18dfe4b
jinja: 'Pick the more likely continuation to the following sentence:

{{ premise }} {% if question == "cause" %} as a result of: {% else %} as a consequence:
{% endif %}

- {{choice1}}

- {{choice2}} ||| {% if label != -1 %}{{ answer_choices[label] }}{%endif%}'
metadata: !TemplateMetadata
choices_in_prompt: true
languages:
- en
metrics:
- Accuracy
original_task: true
name: more likely
reference: ''
Loading