±±¾©Ê¯ÓÍ»¯¹¤Ñ§Ôº2026ÄêÑо¿ÉúÕÐÉú½ÓÊÕµ÷¼Á¹«¸æ
²é¿´: 590  |  »Ø¸´: 1

drlml

Ìú³æ (³õÈëÎÄ̳)

[½»Á÷] Ó¢¹úл·Æ¶ûµÂ´óѧ(QS Top100) ÕÐÊÕ¼ÆËã»ú²©Ê¿, 10Ôµ×ÉêÇë½ØÖ¹!

We are recruiting a PhD student to develop new algorithms for reinforcement learning from human feedback (RLHF), to effectively solve complex reinforcement learning tasks without a predefined reward function. The primary goal of this project will be the development of a novel RLHF framework that can learn more complex behaviours while requiring significantly less interactive human feedback than current RLHF methods. The direction of this project is highly flexible, and the student will have the opportunity to explore related directions that match their research interests. We intend for this project to explore applications of the new RLHF framework, such as fine-tuning and aligning large language models (LLMs), and the use of human feedback in robotics. The project may also explore the use of LLMs as part of the RLHF framework itself, to generate and/or interpret natural language feedback. The specific applications and research directions will depend on the student's own interests.

The preferred starting date for this position would be in February 2026, but this is very flexible.

Supervisors: Dr. Bei Peng, Dr. Robert Loftin

Application deadline: October 31, 2025

Requirements:
1. A Bachelor's or Master's degree in Computer Science, Mathematics, or related field.
2. Solid programming skills and mathematical background in machine learning/reinforcement learning.
3. Proficiency in programming languages such as Python and familiarity with common deep learning and machine learning frameworks.
4. Good English communication skills, with an IELTS score of 6.5 or above (with no less than 6.0 in each component).

Scholarship information:
For UK home students, this is a fully funded 3.5-year PhD studentship. For international students, you will need to pay the difference between the UK and overseas tuition fees by securing additional funding or self-funding (i.e., the PhD studentship will cover tuition fees but not living expenses).

More information and instructions for how to apply can be found here:
https://www.findaphd.com/phds/project/improving-deep-reinforcement-learning-through-interactive-human-feedback/?p186459
(When applying, make sure you name Dr. Bei Peng and Dr. Robert Loftin as your proposed supervisors.)

If you have any questions regarding the position, feel free to contact Dr. Bei Peng  (bei.peng@sheffield.ac.uk)
»Ø¸´´ËÂ¥
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû

µûÓ°

Òø³æ (ÖøÃûдÊÖ)

2Â¥2025-10-10 14:39:01
ÒÑÔÄ   »Ø¸´´ËÂ¥   ¹Ø×¢TA ¸øTA·¢ÏûÏ¢ ËÍTAºì»¨ TAµÄ»ØÌû
Ïà¹Ø°æ¿éÌø×ª ÎÒÒª¶©ÔÄÂ¥Ö÷ drlml µÄÖ÷Ìâ¸üÐÂ
×î¾ßÈËÆøÈÈÌûÍÆ¼ö [²é¿´È«²¿] ×÷Õß »Ø/¿´ ×îºó·¢±í
[¿¼ÑÐ] ÉúÎïѧ308·ÖÇóµ÷¼Á£¨Ò»Ö¾Ô¸»ª¶«Ê¦´ó£© +6 ÏàÐűػá¹ââÍòÕ 2026-03-31 7/350 2026-04-02 23:16 by JourneyLucky
[¿¼ÑÐ] 283·Ö²ÄÁÏÓ뻯¹¤Çóµ÷¼Á +19 ÂÞKAKA 2026-04-02 19/950 2026-04-02 23:01 by Âí¶ù¿ì¿ìµØÅÜ
[¿¼ÑÐ] Ò»Ö¾Ô¸211£¬335·Ö£¬0856£¬Çóµ÷¼ÁԺУºÍµ¼Ê¦ +15 Çã____Ïô 2026-03-27 16/800 2026-04-02 22:50 by JourneyLucky
[¿¼ÑÐ] 366Çóµ÷¼ÁÒ»Ö¾Ô¸¶«±±´óѧ +8 ÔËÆøÀ´µÃÈôÓÐËÆÎ 2026-04-02 8/400 2026-04-02 21:39 by dongzh2009
[¿¼ÑÐ] Ò»Ö¾Ô¸»ªÖÐũҵ071010£¬×Ü·Ö320Çóµ÷¼Á +6 À§À§À§À§À¤À¤ 2026-04-02 6/300 2026-04-02 21:28 by dongzh2009
[¿¼ÑÐ] 22408µ÷¼Á +3 EEchoooo 2026-03-27 5/250 2026-04-02 20:19 by EEchoooo
[¿¼ÑÐ] 318Çóµ÷¼Á£¬¼ÆËã²ÄÁÏ·½Ïò +10 Îüß÷Óк¦óÏÃü 2026-04-01 11/550 2026-04-02 16:29 by oooqiao
[¿¼ÑÐ] 285Çóµ÷¼Á +14 AZMK 2026-04-02 14/700 2026-04-02 15:54 by ÉϾÅÌìÀ¿Ô£¨ºÃÔ
[¿¼ÑÐ] 081200-11408-276ѧ˶Çóµ÷¼Á +3 ´Þwj 2026-04-02 3/150 2026-04-02 15:06 by cal0306
[¿¼ÑÐ] 275ѧ˶081000·þ´Óµ÷¼Áµ½ÆäËûרҵ£¬±£²»×¡±¾×¨ÒµÁË +7 һֻССˮţ 2026-04-02 8/400 2026-04-02 14:23 by alice-2022
[¿¼ÑÐ] 266Çóµ÷¼Á +4 ѧԱ97LZgn 2026-04-02 4/200 2026-04-02 13:03 by yulian1987
[¿¼ÑÐ] 322Çóµ÷¼Á +5 ìäÙÒXX 2026-03-31 6/300 2026-04-02 10:08 by Çóµ÷¼Ázz
[¿¼ÑÐ] 303·Ö 0807ѧ˶Çóµ÷¼Á +3 TYC3632 2026-04-01 3/150 2026-04-01 19:24 by lwk2004
[¿¼ÑÐ] 301Çóµ÷¼Á +8 axibli 2026-04-01 8/400 2026-04-01 09:51 by ÎҵĴ¬Îҵĺ£
[¿¼ÑÐ] 0856 335·Ö +9 cccchenso 2026-03-29 9/450 2026-03-31 16:37 by lishahe
[¿¼ÑÐ] 317·Ö Ò»Ö¾Ô¸ÄÏÀí¹¤²ÄÁϹ¤³Ì ±¾¿Æºþ¹¤´ó Çóµ÷¼Á +12 ÓóÄàСÁåîõ 2026-03-28 12/600 2026-03-30 17:06 by wangjy2002
[¿¼ÑÐ] 085602 »¯Ñ§¹¤³Ìר˶ 340·ÖÇóµ÷¼Á +4 qianbai11 2026-03-29 4/200 2026-03-30 11:34 by ÌÆãå¶ù
[¿¼ÑÐ] һ־Ը˫һÁ÷»úе285·ÖÇóµ÷¼Á +4 ÐÒÔ˵ÄÈýľ 2026-03-29 5/250 2026-03-29 14:49 by Miko19
[¿¼ÑÐ] Çóµ÷¼Á +7 ÕùÈ¡¾Åµã˯ 2026-03-28 8/400 2026-03-28 21:07 by ÕùÈ¡¾Åµã˯
[¿¼ÑÐ] ²ÄÁÏÓ뻯¹¤£¨0856£©304ÇóBÇøµ÷¼Á +8 Çñgl 2026-03-27 8/400 2026-03-28 12:42 by ÌÆãå¶ù
ÐÅÏ¢Ìáʾ
ÇëÌî´¦ÀíÒâ¼û